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ABSTRACT 

The developnent of a prototype computerized, 
criterion-referenced test of certain nonperformance musical behaviors 
for administration to entering students in music education at a state 
university was undertaken. After the formulation of statements of 
competencies tc serve as conceptual bases tor the formulation of test 
items, items were constructed for 12 subtests. Four subtests were 
selected for programming. The test was programmed sequentially for 
the iBrt 1500 Instructional System. The students* score tor each 
si’.btest was the number of items actually answered correctly plus the 
number of items for which a correct answer was assumed. The 
computerized test was administered to ^2 undergraduate music 
education students. A parallel conventional version of the test was 
given to 28 other students, and a comparative analysis was made. The 
tests wer^ not shown to be equivalent- Quantitative inadequacies may 
be explained by the discrepancies between estimated orders of item 
difficulty and the true orders of item difficulty for the particular 
s ''udents tested. From a qualitative standpoint, the computerized test 
r<> forms adequately, with refinement, it could provide a convenient, 
.d assessment of students in regard to certain expected 
nonperformance musical competencies. (Author/CK) 
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Purpos e 

The purpose of the study was to develop a prototype computerized, 
criterion-referenced test of certain nonperfomance musical behavior^ 
for administration to entering students In music education at The 
Pennsylvania State University, with the expectation that the test could 
provide a pattern for development In similar situations. 

Procedures 

After the formulation of statements of competencies to serve as 
conceptual bases for the formulation of criterion-referenced test Items, 
test items were constructed for twelve subtests. A total of 783 such 
Items were constructed for twelve subtests. A total of 783 such 
items were administered to music and music education undergraduates at 
seven Pennsylvania Institutions of higher education. Including Penn 
State. Item difficulty Indices were computed, and twenty-item scales, 
..rra.oged In order of difficulty, were selected for each subtest. Four 
subtests were selected for programming. 

The test was Programmed for the IBM 1500 Instructional System In a 
sequential or Incremental manner. In accordance with the programming 
strategy adopted for the final administration, a student began a subtest 
with the fourth Item of the twenty-item scale. A correct response 
branched tne student ahead to the eighth Item, the assumption being that 
the student would have answered the first, second, and third Items cor- 
rectly because they were of less difficulty than the fourth Item. The 
student proceeded In Increments of four until the twentieth Item was 
^nswered correctly or an initial erroneous response occurred. An 1n1- 
ERiCi error caused a reverse branch of three Items. From that point, the 
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student continued the subtest in linear fashion until the end of the 
subtest was •'eached, three erroneous responses occurred in succession, 
or a total of five erroneous responses had occurred. The student's 
score for each subtest was the nu.oer of items actually answered cor- 
rectly plus the number of items for which a correct answer was assumed. 

Results and Conclusions 

In P:tober, 1970, the computerized test was administered to thirty- 
two undergraduate music education students at Penn State. A parallel 
conxcntional version of the test was administered to twenty-eight other 
students, and the two versions were compared with an analysis-of-vari- 
ance pr\K,edure for equivalency. The tests were not shown to be equiva- 
lent, although their mean scores did not, with one exception, differ 
signi .'icantly. Quantitative Inadequacies may be explained by the 
discrepancies between estimated orders of item difficulty and the true 
orders of item difficulty for the particular students tested. 

From a qualitative standpoint, the computerized test performs 
adequately. Refinemint is indicated by reordering of the test items on 
the basis of estimates of item difficulty obtained from larger groups of 
students. Lengthening the test to Include areas representative of more 
behaviors might also be in order. Vlith such refinement, the test could 
provide a convenient, rapid assessment of the status of music education 
students in regard to certain expected nonperformance musical compe- 
tencies. 
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CHAPTER I 



INTRODUCTION 
PURPOSE OF THE STUDY 

The basic purpose of this study was to develop a prototype com- 
puterized criterion-referenced test for measuring competencies in 
certain nonperformance musical behaviors present in undergraduate stu- 
dents commencing their course of study In music education. The proto- 
type was developed utilizing students and resources of The Pennsylvania 
State University at University Park, Pennsylvania, and six other 
Pennsylvania institutions of higher education. 

BACKGROUND 

College students pursuing a course of study in music or music 
education include in their program the study of nonperformance areas, 
i.e., areas such as music theory, music history, and music literature 
which are not directly concerned with vocal or instrumental performance 
Adequate musical preparation for entry into the profession of music 
education involves more than the development of technical vocal and 
instrumental skills. The formal music education necessary for the pros 
pective teacher and perForrtor should include thorough theoretical, 
historical, and stylistic study. ^ 

The standards and expectancies of colleges and universities 
regarding competence in nonperformance areas vary; learriers vary. If a 

^James Jorgenson, "Advice to the Potential College Music Major," 
Instrumentalist, XXP (April, 1968), 38-39. 
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particular college music or music education department could reliably 
measure Its own entering students' nonperformance musical behaviors and 
compare the measurements with the particular expectancies of the col- 
lege, certain curricular problems might be alleviated. Needed remedial 
learning experiences for those students Identified as not meeting mini- 
mal expectancies could be Indicated. Qualitative descriptions and 
analyses of nonperformance musical behaviors could be a basis for 
advanced couise placement and Piemptlon from certain courses. 

Although nothing In this area had been done prior to the research 
reported herein. It appeared that a computer-based Instructlo.nal system, 
designed for rapid processing of student responses to Interrogative 
stimuli, could serve as a means of measuring witl' speed, flexibility, 
and efficiency the extent to which expectancies In nonperformance musi- 
cal behaviors were met by a given student. Description and analysis of 
student nonperfonnance musical behaviors could be facilitated by pro- 
gramming a computer to serve as a device for the measurement of profi- 
ciency In such behaviors. 

Given sufficient breadth and depth of observation, such a measuring 
device could serve as a diagnostic achievement test because It would 
purport to measure a certain pattern of musical achievement. The cur- 
rent lack of music tests which serve as diagnostic tools has been cited 
by Lehman.^ At present. It Is unlikely that existing published music 
tests adequately serve as a diagnostic achievement test for comparison 
of observed nonperformance musical behaviors with expected nonperform- 
ance musical behaviors. 
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^Paul R. Lehman, Tests and Measurements In Music (Inglewood 
Cliffs, New Jersey: Prentice Hall, Inc., 1968), p. 86. 
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rNADEQUACY OF PRESENT TESTING TECHNIQUES 



Test Items Intended to measure musical behavior of students at a 
particular institution of higher learning should be based upon the 
goals, standards, and criteria for success in that institution. The 
particular objectives of one institution may be considered to be some- 
what different from another. Tests for similar purposes in various 
music and music education departments may have similar formats, but 
content and sequencing of items should be free to vary. It is apparent 
that tests prepared on a national or regional basis with a rigid content 
and order of items may fail to reflect the instructional objectives and 
emphases of a particular music or music education faculty. 

Music educators lack a national consensus as to what specific 
musical outcomes are expected as a result of Instruction in music. No 
existing published achievement test is likely to receive widespread 
acceptance as a measurement tc'.l because the profession does not appear 
to have a sufficient degree of consensus with regard to what musical 
behaviors are desirable. Consequently, it was proposed to begin the 
development of the proposed measuring device by constructing a test of 
certain nonperformance musical behaviors for ,i specific institution 
vhere a consensus of institutional goals was obtainable. The pattern of 
development that has evolved is adaptable for application elsewhere. 

3 

Lehman, pp. 57-58. 
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NEED FOR CRITERION-REFERENCED TESTING 

It was proposed to develop a test that would assess nonperformance 
musical behaviors In relation to criterion behaviors. The behaviors to 
be observed and measured were to be specified and stated in the form of 
observable student objectives. The original intent was that these 
objectives were to represent the minimal amount of competence that 
entering music or music education students at a particular 1n.>titut1on 
could be expected to display as evidence of criterion attainment. 
Although the objectives were eventually expressed In terms of observable 
competencies which an underg<'aduate student In music education should 
attain In the course of his pre-professional training, rather than in 
numerical expressions of desirable entering competencies, the specifi- 
cation of the behaviors to be measured as the Initial phase of test 
development was in accordance with contemporary principles of test 

4 

development. 

The distinction between norm-refei^nced and cri terion-referenced 
measures Is vital and of fundamental importance. Glaser explains that 
two kinds of primary information, differing principally In the stand- 
ard used as a reference, are obtainable from an achievement test. The 
relative ordering of Individuals with respect to their test performance 




“^Robert Glaser, "Instructional Technology and th.' leasurement of 
Learning Outcomes; Some Questions," American Psychologi st , XVIII 
{August, 1963), 520j Robert Glaser and David J. Klaus, ''Proficiency 
Measurement; Assessing Human Peiformance," Psychologi cel Principles In 
System Development , Robert H. Gagne, editor (New York; Holt, Rinehart, 
and Winston, Inc. , 1962), p. 430; C. H. Lindvall, Tes ting a nd 
Evaluation; An Introduction (New York; Harrourt,"¥race, ali^ World, 
Inc., 1961), pp. 23-25. 
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is the type of Irformatlon provided by a norm-referenced measure, a 
measure dependent upon a relative standard for relating Individuals to 
each other. Another type of Information Is the degree to which the 
student has attained criterion performance and Is provided by a crite- 
rion-referenced measure which Is dependent upon an absolute standard 

5 

of quality to represent criterion performance. 

A criterion-referenced measure provides explicit Information 
regarding an Individual's ability to perform a task. The Individual's 
score Indicates the degree of competence he has attained In relation to 
an ordered continuum of expected behaviors rather than In relation to 
the perforaance of others.® 

A norm-referenced test compares Individuals with each other rather 
than with a behavioral standard; It Indicates only how much a student 
knows with respect to other students. The shortcomings of ordinary 
norm-referenced achievement tests for assessment of learning have been 
recognized by various authorities In regard to the ongoing national 
assessment program.^ 

Although Cronbach defines a tfSt as "a systematic procedure for 

O 

comparing the behavior of two or more persons," the comparison of one 
person to another was not the purpose of the test developed In this 



®Glaser, 519. 

®Glaser and Klaus, p. <22. 

^Caroline Hightower, How Much Are Students Learning? Plans for a 
National Assessment of Education tAnn Arbor, Michigan: The Coimittee 
on Assessing the Progress of Education, 1968), p. 6. 

®Lee J. Cronbach, Essentials of Psychological Testing (2nd ed.; 
New fork: Harper and Row, 19^0) , p. . 
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study. Rather, the purpose was perceived as a comparison of a person's 
existing observed nonperformance musical behaviors with desired nonper- 
formance musical behaviors as represented by test items that demonstrate 
attainment of criterion competencies, i.e., a criterion-referenced test. 

Tests which presently exist in music, although meeting Cronbach's 
definition, do not appear to compare observed with expected behavior. 
This is not unexpected since the assessment procedures conventionally 
used in development of the typical standardized test in any area of 
knowledge do not include a method fo. assessing student performance in 
terms of instructional otg'ectives. Existing achievement tests appear to 
have as their purpose the demonstration of the great range of individual 
differences in behavior. Continued refinement of norm-referenced tests 
to maximize their discri minatory power is not likely to be worthwhile 

g 

for the purpose of measuring achievement in terms of expected behaviors. 
Comparison of the observed with the expected requires criteria for that 
which is expected, not discrimination among those who are observed. 

NEED FOR THE APPLICATION OF COMPUTER TECHNOLOGY 

Technological davelopments have made it possible to create new 
testing materials and present them in a variety of ways. A co'outer may 
be programmed to present varied test stimuli, to record and evaluate the 
responses, and to provide a printed summary and Interpretation of each 
individual test performance in relation to a behavioral standard. Tyler 
states: 

^Robert M. Gagne', The Conditions of Learning (New York: Melt, 

Rinehart, and Winston, Inc., 1966 ), p. 258. 
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Now that high-speed computers and electronic data 
processing make individual diagnosis, recording, and 
treatment feasible, teachers do not have appropriate 
evaluation instruments to guide greater individualization 
of Instruction. We are still so obsessed with the ranking 
of indi.'iduals on the basis of scores that we have not 
developed .adequately the tools and procedures required. 

Theory and practice need to be reexamined in terms of 
present conditions and opportunities. 10 

It was the researcher's belief that computer technology could be 
used effectively to bring new techniques to bear upon the problem of 
the measurement of nonperformance musical behaviors. The technique of 
sequential or incremental testing, whereby the student's response 
history is utilized to determine the order of presentation of test items 
to an individual student, appeared particularly premising. Furthermore, 
the computer can smoothly and rapidly present a variety of musical 
stimuli in an individualized manner by coordinating the appropriate 
auxiliary apparatus. 



SUMMARY OF THE BACKGROUND OF THE STUDY 



A lack of a suitable measuring instrument was perceived for com- 
paring certain nonperformance musical behaviors of entering college 
music and music education students with expected levels of competence. 
Norm-referenced tests that discriminate between individuals were viewed 
to be inappropriate for the purpose. It was therefore proposed to use 



^Ralph W. Tyler, "Changing Concepts of Educational Evaluation," ^ 
Pe rspectives of Curriculum Evaluation , Ralph W. Tyler, Robert M. Gagne, 
and Michael Scriven, editors (Chicago: Rand McNally and Company, 1987), 

p. 17. 
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computerized presentation and analysis to rapidly administer a crite- 
rion-referenced test to evaluate the behavior of entering freshman 
music and music education majors In certain nonperformance areas In 
relation to defined expectancies. 
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CHAPTER II 



REVIEW OF SELECTED LITERATURE 

The purpose of this chapter Is to provide a conceptual basis for 
the work that was undertaken by illustrating research and opinion that 
had been previously applied to the development of tests in nonperform- 
ance areas for entering students, criterion-referenced measures, and 
feasibility of computerized testing. 

A substantial amount of literature has been developed regarding 

1 2 

tests and measurements in music. Lehman and Whybrew have written 
textbooks discussing problems Inherent in music testing, certain statis- 
tical concepts, the classification of tests as aptitude or achievement 

measures, and published standardized tests in music. As portions of 

3 4 

psychology of music texts, psychologists such as Farnsworth and Lundin 
have reviewed tests and discussed problems in the context of definition 
and measurement of musical oehavior. The controversy between the 
Seashore atomistic view of musical talent and the Mursell general view 
of musical talent with implications for testing has been widely 
reported. A comprehrnsive listing of literature pertinent to the 

Vaul R. Lehman, Tests and Measurements in Music (Englewood Cliffs, 
New Jersey: Prentice-Hall, Inc., 1968), pp. 1-99. 

^William K. Whybrew, Measurement and Evaluation in Music (Dubuque, 
Iowa: The William C. Brown Company, 1962), pp. 1-184. 

^Paul R. Farnsworth, The Social Psychology of Music (New York: 
Holt, Rinehart, and Winston, Inc., 1958), pp. 1-304. 

^Robert W. Lundin, An Objective Psychology of Music (2nd ed.; 

New York; Ronald Press, 1967), pp. 1-345. 

c 

Lehman, pp. 40-41. 
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general topic of testing in music, compiled and categorized oy Lehman, 
provides a total of 298 discrete entries, including psychological tests, 
reviews, texts containing sections on music tests, experimental studies, 
studies of published tests, and works regarding the status of testing.® 

General tests and measurements literature, though related, is not 
as directly pertinent to the developmental research reported herein as 
are certain more specific materials. Literature regarding standardized 
tests of music, musical aptitude and Its definition, and the philoso- 
phical justification for testing is only peripherally related to the 
conceptual basis of the research. The literature critical to the pre- 
sent study has been devoted to (1) development of tests for diagnosis of 
difficulties of entering music and music education students in nonper- 
formance areas, (2) the feasibility of the proposed computerized 
approach, and (3) criterion-referenced measures. 

CERTAIN PRIOR DEVELOPMENTS REGARDING TESTING 
IN NONPERFORMANCE AREAS 

Ball developed a test measuring responses to elements of rhythm, 
melody, and harmony, singly and in combination, to serve as a college 
entrance test of music. The items were administered for trial purposes 
to equal samples of high musical ability and lew musical Ebility stu- 
dents, with theory grades and teachers' ratings as the basis of ability 
determination. The final test items were selected on the basis of tneir 
power to discriminate between the high and low groups, rather than on 

®Paul R. Lehman, "A Selected Bibliography of Works on Music 
Testing," Journal of Research In Music Education , XVII (Winter, 1969), 
428-442. 
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the basis of how well they represented expected criterion performance. 
Ball's test does not appear to be criterion-referenced when his proce- 
dure for Item selection Is considered.^ 

Perry constructed a test to be adiulnlstered to entering freshmen 
for purposes of guiding, counseling, placing, and selecting the students 
In and for theory classes. After one semester of theory instruction, a 
correlation coefficient of .60 was found between scores on seven selec- 
ted predictor portions of the Perry test and criterion scores obtained 
from proficiency examinations In theory. Perry's purpose was to m?ke 
a comparison of the abilities of various predictive rreasures to predict 
examination scores rather than comparing observed behaviors with crite- 
rion performance.® 

Mansur devised a Mind Instrumentalist Inventory Scale for use as a 
paper and-pencll objective test of achievement related to musical per- 
formance. He suggested that It could be administered to entering fresh- 
men as a predictive and screening device for college and university 
Instrumental groups. This performance-related test discriminates 
between individuals rather than ascertaining the extent to which the 

9 

objectives of an Institution have been met. 



^Charles Hershel Ball, "The Application of an Empirical Method to 
the Construction of a College Entrance Test In Music" (unpublished 
doctoral dissertation, George Peabody College for Teachers, 1964), 
Dissertation Abstracts , XXVI (July-August, 1965), 404. 

®Will 1am Wade Perry, "A Comparative Study of Selected Te.^ts for 
Predicting Proficiency In Collegiate Music Theory" (unpublished doctoral 
dissertation. North Texas State University, 1965), Dissertation 
Abstracts . XXVI (January, 1966), 3995-3996. 

^Paul Max Mansur, "An Objective Performance-Related Music Achieve- 
ment Test" (unpublished doctor's dissertation. The University of 
Oklahoma, 1965) . 
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The Gordon Musical Aptitude Profile , a rionn-referenced, published, 
standardized test of musical aptltudej^ was used by Hatfield to diag- 
nose tonal and rhythmic strengths and weaknesses In a correlational 
study using South Dakota State University band students. The highest 
1 ntercorrel ations were found between the "Tonal Imagery" section of the 
Gordon test and certain tonal -creative behaviors related to Instrumental 
performance; the rhythmic results were not as clear. Criterion behav- 
iors appropriate to band students apparently were not taken Into 
account. 

Edwin Gordon, the author of the Musical Aptitude Profile , main- 
tains that the Instrument can be used to help college music administra- 
tors and teachers in the diagnosis of Individual musical strengths and 
weaknesses. In the measurement of nonperformance musical behaviors 
with this norm-referenced measure, however, the comparison Is between 
observed behavior and norms based ’-pon the test performance of a rep- 
resentative sample of subjects. Although this may be of some value. 

It is not identical to using a criterion-referenced measure. Further- 
more, the use of an aptitude measure such as the Gordon test, designed 

^*^Edw1n Gordon, Musical Aptitude Profile {Boston; Houghton 
Mifflin Company, 1965). 

^^-larren Gates Hatfield, "An Investigation of the Diagnostic 
Validity of the Musical Aptitude Profile with Respect to Instrumental 
Music Performance" {unpubl 1 shed doctoral dissertation. The University of 
Iowa, 1967), Dissertation Abstracts, XXVII ( Januury-February, 1968), 
3210A. 

^^Edwln Gordon, "Implications for the Use of the Musical ^titude 
Profile with College and University Freshman Music Students," Jour nal of 
Research in Music Education, XV (Spring, 1967), 34. 
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to predict or forecast over an extended period of time, is somewhat 

questionable for diagnosis of present strengths and weaknesses in 

1 3 

relat'ion to a current instructional process. 

Douglas grouped freshman music m Jors at the university of Georgia 
in the fall of 1964 into a tripartition of high, median, and low. 

The high group 1mm>;d1ately began the study of music theory, while the 
median and low groups received one and tv;o quarters of preparatory 
instruction respectively. Douglas found that a greater percentage of 
students could ultimately cope with theory as a result of being grouped, 
and suggested that the combination of tests be used to make the tripar- 
tition, his own test plus the Aliferis Wusic Achievement Test (Colleg e 
Entrance Level ) , could be useful for counseling purposes. 

The Aliferis test consists of six subtests: "Melodic Elements," 

"Melodic Idioms," "Harmonic Elements," "Harmonic Idioms," "Rhythmic 
Elements," and "Rhythmic Idioms." All items require some form of aural- 
visual discrimination, i.e., the student relates what he hears to an 
array of visual stim-Jli. Such discriminatory skills are helpful in the 
study of music theory; the Aliferis test w’s undoubtedly useful in 
making Douglas's tripartition. But it is a norm-referenced standard- 
ized test. The manual carefully presents norms for each section of tne 

^ ^Robert Glaser, Evaluation of Instruction and Changing Educa - 
tional Models , C.S.E.I.P., Occasional Report No. 13 (Lcs Angeles: 
University of California at Los Angeles, Center for the Study of Evalu 
ation of Instructional Programs, 1968), pp. 12-13. 

^^Charles Herbert Douglas, "Measuring and Equalizing Music Theoiy 
Competence of Freshmen College Music Majors" (unpublished doctoral 
dissertation. The Florida State University, 1965), Pi ssertation 
Abstracts . XXVI (February, 1966), 4712. 
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test, regional norms, national norms, and norms for various t^pes of 

1 t: 

institutions."'' Henoe, tht students in the Douglas study were compared 
v/ith each other. A criterion-referenced measure could have been used 
to compare the students with University of Georgia theory standards, 
assuming that the standards could have been stated in a manner condu- 
cive to the construction of test items. 

The tests developed by Ball, Perry, and Douglas are representative 
of the usual measuring instrument constructed for the purpose of meas- 
uring musical behaviors of entering students. Test items are selected 
on the basis of their powers of discrimination rather than on the basis 
of their relationship to pertinent criteria of performance. No crite- 
rion-referenced measure for the purpose of measuring nonperformance 
musical behaviors was known to the researcher at the onset of the test 
development reported herein. There was, however, significant interest 
in criterion-referenced testing outside of the field of music. 

CRITERION-REFERENCED TESTING 

The distinction between criterion-referenced and norm-referenced 
tests is made by Glaser in terms of differing kinds of primary informa- 
tion obtainable from the two forms of tests. Criterion-referenced 
measures provide information regarding the degree to which criterion 

^^James Aliferis, Aliferis Music Achievement Test (College 
Entrance Level) (Minneapolis, Minnesota: Universi ty of Minnesota Press, 
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performance has been attained; norm-referenced measures provide Informa- 
tion regaruniy Ific iclotive oro’eriny of individuals in terms of their 
observed achievement.^^ Popham and Husek clarify the distinction by 
explaining that norm-referenced measures generally Imply a concern for 
selectivity, while criterion-referenced measures imply a corcern for 
competence in an Individual or the efficacy of a treatment.^ ^ 

Glaser and Klaus discuss criterion-referenced measures in relation 
to job training. They refer to a continuum of skill at a given task 
that ranges from no proficiency at all to perfect performance. The 
behaviors which an Individual displays during testing of this skill fall 
at some point on the skill continuum, and the degree to which these 
behaviors resemble desired or criterion behaviors can be assessed by a 
criterion-referenced measure. Criterion levels are also ordered on a 
continuum; they can be established at any point where It is necessary 
to obtain Information as to the adequacy of an individual's learning. 
Specific behaviors expected at a given level of proficiency, such as the 
college entrance level, may be Identified and used to descrioe specific 
tasks which the individual Is to perform. 



^“Robert Glaser, "Instructional Technology and the Measurement oi 
Learning Outcomes: Some Questions," Ame rican Psychologist , XVII 

(August, 1963), 520. 

James Popham and T. R. Husek, "Implications of Criterion- 
Referenced Measurement," Journal of Educational Measurement , VI (Spring, 
1969), 1-9. 

1 

Robert Glaser and David J. Klaus, "Proficiency Measurement: 
Assessing Human Performance," Psychological Principles In System 
Development , Robert H. Gagne*”, editor (New Yorkl Holt, Rlnenart, and 
Winston, Inc., 1932), pp. 421-422. 
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The lack of consensus among music educators as to what musical 
outcomes are to be expected as a result of Instruction does not mean 
that criteria for a criterion-referenced test cannot be selected. 
Arbitrary standards may be established by the faculty of a given Insti- 
tution with regard to their own philosophy, experience, and view of 
music education. Glaser and Klaus state: 

. . .the lack of well-defined system standards does 
not preclude the use of criterion-referenced measures. 

Arbitrary proficiency levels can be established for 
minimum performance. For Instance, It Is possible to 
select standards In academic training which reflect 
decisions as to the least amount of end-of-course com- 
ptenecy the student Is expected to attain . . .it Is 
possible to use the maximum amount of course content 
presented to the student as a standard.’" 

A conceptual basis for criteria and objectives of a criterion- 
referenced test in music may be found In an Interim Report of the MENC 
Commission on Teacher Education, wherein the Commission states certain 
competencies that should be displayed by qualified music educators as a 
result of their teacher traiiiing experience. The CommibSion Indicates 
that all music educators should display skills in performance, composi- 
tion, and analysis. Of particular Importance for the research reported 
herein is the Commission's endorsement of competency in the supervision 
and evaluation of the performance of others and competency in the iden- 
tification of compositional devices. The researcher's test of certain 
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Glaser and Klaus, p. 426. 
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1 

j nonperformance musical behaviors was completed and administered prior to 

the appearance of the report; however, future work may d^aw Increas- 

I 20 

Ingly upon the Commission's publication. 

1 Once criteria and objectives related to skills cr competencies are 

( 

established, It may be desirable to obtain Information about an Indi- 
I vidual's degree of skill or competency. Norm-referenced measures do not 

j provide much Information regarding Individual degrees of skill or com- 

petency; they provide comparisons between a particular Individual’s test 

I 21 

performance and the performance or other members of his group. 

Norm-referenced tests suggest grouping those who are tested Into a 
! normel distribution. Bloom notes that although the normal distribution 

j is the distribution most appropriate to chance and random activity, 

education is a purposeful activity. The distribution of student 
I achievement, therefore, should be quite different from the normal 

distribution if teachers are effective in their Instruction. Relative 
1 standards are inappropriate if teachers desire to bring all their stu- 

Op 

dents to a criterion level. 

-• Glaser indicates that criterion-referenced tests do not group stu- 

I dents into a normal distribution. Such tests provide individual 

I rommission on Teacher Education, "Teacher Education in 

Music: An Inter!:! Report of the MENC Commission on Teacher Education," 

J Music Educators J o urnal , l.VII (October, 1970), 38-41- 

^^Glaser and Klaus, p. 422. 

PP 

Benjamin 5. Bloom, "Learning for Mastery," Evaluation Comrent, L 
I (Hay, 1968), 2-3. 
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Information independent of reference to the performance of others 

oecause criterion-refereni'.ed tests indicate the correspondence between 

an individual's observeo behavior ..nd an underlying continuum of 
?3 

achievement. 

Popham and Husek discuss differences between criterion-referenced 
measures and norm-referenced measures in terms of item selection; they 
state that the writer of norm-referenced measures, in an effort to 
promote variant scores fo^” the purpose of discriminating among indi- 
viduals, rejects test item that are quite difficult or quite eaty. The 
writer of the criterion-referenced measure is concerned with whether or 
not the test items represert the desired class of behaviors. The 
inappropriateness of deliberately promocing a sprr-ad of scores when one 

is concerned with group achievement of criterion behaviors is also 

25 

discussed by Glaser and Cox, while Co\ and Vargas suggest that item 
selection for a criterion-n.'ferenced measure may be more profitably con- 
ducted by evaluating items through a pretest-posttest method to deter- 
mine the items' ability to indicate whether or not instruction benefitei 
the student.'' An Item with a difficulty index of 0.00 or 1.00 might 



23 

Glaser, "Instructionil Technology and the Measurement of 
Learning Outcomes: Some Questions," 519-520. 

24 

Popham and Husek, 4. 

25 

Robert Glaser and Richard C. Cox, "Criterion-ReF', need Testing 
for the Measurement of Educational Outcomes," I nstruct! cr ..1 Process and 
Media Innovation , Robert A. Veisgerber, editor"7 Chicago: Rand ficNally 

and Co. , Inc., 1968) , p. 549. 

26 

Richard C. Cox and Oiilie S. Vargas, "A Comparison of Item 
Selection Techniques fo*" Noni-referenced and Criterion-referenced Tests" 
{paper read at the annual meeting of the National Council on Measurement 
in Education, February, 1966, Chicago). 
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be rejected as nondiscriminating by the writer of the nonn-referenced 
test, but such an Item on a criterion-referenced test may be clear 
evidence that a criterion behavior has or has noi been attained. 

COMPUTtR TECHNOLOGY AND TESTING 

The growth and increasing sophistication of computer technology in 

recent years has major applications to testing. Entire computer-based 

test development systems are feasible, both in schools and Industry. 

Tests of the criterion-referenced and norm- referenced variety can be 

. 27 

developed, presented, and analyzed at very rapid speeos. 

The computerized presentation and analysis of a test initially 
constructed off-line (i.e., without a computer) is perhaps less sophis- 
ticated than computerized construction of a test from a vast bank of 
potential items, but such presentations have been successfully devel- 
oped. Greer, for example, conducted a pioneering study of the use of a 
compiiter to score and analyze a test and prepare a diagnostic report. 

He concluded that computerized testing was feasible, and that it 
increased efficiency and provided useful basic information at the United 
States Naval Examining Center. It was recommended that educators 

^^Jack V. Edling, "New Media Applications," Man-Machine SystCTS in 
Education, John W. Loughary, editor (New York: Harper and Row, 1966), 

pTTFT 
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consider the computer for scoring, analysis, and diagnosis. Williams 

found the computer to be valuable for Individual diagnosis and evalua- 

29 

tion In a reading program. 

French developed e means of rapidly presenting and scoring test 
Items, sequentially arranged according to difficulty, for vocational 
and technical students through the IBM 1050 computerized typewriter 
terminal and 1410 computer. Numerical and verbal items were selected 
from the Henmon-Nelson Tests of M ental Ability . Rather than presenting 
every item to every student, French utilized an individualized branching 
approach. The numerical test items were presented in order of increas- 
ing difficulty in Increments of eight; i.e., a student was asked to 
respond to every eighth item. An Incorrect response caused the student 
to go back five times in the test program and be presented with every 
second item. A second incorrect response branched the student back five 
items and presented every item, omitting items that were previously 

presented. Four misses out of seven items discontinued the test 
30 

program. 



^®Harry Holt Greer, Jr., "The Application of a Digital Computer to 
Scoring and Analysis of Examinations and the Preparation of Diagnostic 
Reports" (unpublished doctoral dissertation, The George Washingto.n 
University, 1966), Dissertation Abstracts, XXVII (Septembt-r-October , 
1966), 923A. 

^^Gilbert Williams, "The Use of the Computer for Testing, Program- 
ming, and Instruction," Research in Education , III (May, 1958), 195. 

^^Joseph L. French, "Numerical and Verbal Aptitude Tests 
Administered at the CAI Student Station," Semi-Annual Progress Report 
(prepared by Harold E. Mitzel, et al), Experimentation with Computer- 
Assisted Instruction in TechnicaT Education, Project No. 5-85-0/'^ . 
(University Park, Pa. : The Pennsylvania State niversity Computer- 

Assisted Instruction Laboratory, 1967), pp. 5u • 2. 

O 
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The items were arranged in a linear order of difficulty in french's 
test. The student commenced the test with an easy item and gradually 
worked toward the difficult i'.ems. An clternative arrangement was 
utilized by Hansen, who programmed test items from a midterm physics 
examination at Florida State University for a computerized presentation. 
The student commenced the Hansen test by responding to an item in the 
middle of the difficulty scale. A correct response branched the student 
to a harder item; an incorrect response branched the student to an 
easier item. The student always moved ahead, but the difficulty of the 

next item presented was determined by his response to the present 
31 

i tern, 

A concept of sequential testing is illustrated by the French and 

the Hansen tests. In each case, the test items are arranged in a 

purposeful nonrandom sequence. The use of the computer made it 

possible for a student to substantially complete each test by taking 

only certain items, depending upon his response history. A computer is 

not essential to a sequential test if every student is to respond to 

every item; Cox and Graham developed a sequential test based on a 

sequence of arithmetic behaviors ordered according to a hie^-archy of 

difficulty upon which the ability to add two two-digit numerals '.nvolv- 

32 

Inq "carrying" appeared to be based. 



Duncan N, Hansen, .'‘.n Investigation of Computer-Based Science 
Testing, FSU CAI Center, Semiannual Progress Report, Report No. 6 
tprepared by Duncan N. Hansen, Walter Dick, and Henry T. Lippert) 
(Tallahassee, Florida; Florida State University Computer-Assisted 
instruction Center, 1068), pp. 59-94. 
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^^Ri.hard C. Cox anc Glenn T. Graham, "The Development of a 
Sequentially Scaled Achievement Tost," Journal of Educationa l Heasure - 
ment. III (Summer, 1966), 147-150. 
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In connection with other research with computer-assisted instruc- 
tion, tests have been utilized as part of the instructional process to 
determine what sections of a computer-assisted course might be of most 
benefit to the student. For example, Lippert and Ehlers developed for 
computerization a set of test items reflecting competencies which an 
entering graduate student in the social science area was believed to 

require. These items were used to plot computer-assisted instruction 

33 

for the areas of weakness revealed L7 the test. Delhi programmed a 
diagnostic quiz at the beginning of the rhythm section of a computer- 
assisted instruction course in certain skills of Instrumental music, 
developed with the assistance of the researcher. Based upon the stu- 
dent's quiz performance, a decirlon was made to branch the student 

through one or two remedial sections or to branch him directly to the 
34 

rhythm program. 



SUMMARY OF SELECTED LITERATURE 

Examination of pertinent literature indicates that tests developed 
in recent years to measure entering musical behaviors in nonperformance 
areas tend to be useful principally for the separation of entering stu- 
dents into groups. Criterion-referenced testing has not been 



^^henry T. Lippert and Walter Ehlers, Computer-Based Testing , FSU 
CAl Center, Annual Progress Report, Report No. 7 (prepared by Duncan N. 
Hensen, Walter Dick, and Henry T. Lippert) (Tallahassee, Florida: 
Florida Slate University Computer-Assisted Instruction Center, 1968), 
pp. 18-20. 

^^Ned C. Delhi, Development and Evaluation of Computer-Assisted 
Instruction In I nstrume^ntal Music , Project No. 7-0760, ERIC No. 

ED 035 314. (Washington: Office of Education, U. 5. Department of 
Q"''alth, Education, and Welfare, 1969), p. 22. 
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investigated in the area of music. Computer technology may be utilized 
for rapid test administration and analysis; it is particularly useful 
for utilizing a student’s response history in determining which test 
items from a sequential test are to be administered. Thus, a conceptual 
framework for the present research has been established. 
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CHAPTER III 



MATERIALS AND PROCEDURES 

Procedures fol’owed in tKe development of the test and materials 
are discussed in this chapter. The stages of development included the 
development and formulation of objectives, development of test items, 
empirical trial of test items, programming, and main test administra- 
tion. 



DEVELOPMENT AND FORMULATION OF OBJECTIVES 
Importance of Objectives 

The construction of any test is impossible without some concep- 
tualization of what is to be measured. Tests are written because test 
authors are seeking to determine whether or not certain expected behav- 
iors occur. Consequently, those behaviers and the means for their 
recognition must be specified. In the case of achievement tests, such 
behaviors must be related to instruction. Glaser states that it is 
mandatory to specify minimun: levels of achievement which indicate the 
minimum level of competence a student should display at any crucial 
point in an instructional sequence.^ Glaser and Klaus maintain that the 
specification of behavioi which is to be observed and measured is the 



^Robert Glaser, "Instructional Technology ind the Mea.surement of 
Learning Outcomes: Some Questions," American Psychologist, XVII 

(August, 1963), 520. 
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2 

fnitifll step in the development of a measure of proficiency, Lindvall 

stresses that any plan to assess achievement must begin with a clear 
. 3 

sped f icatton of objectives. Regarding what he perceives tv be a 
beginning revolution in education^ particularly in regard to individu- 
alization of instruction and concentration upon mastery of learning 
rather than discrimination among learners, Mitzel indicates that 
achievement tests need to be keyed to course objectives, stated in 
behavioral terms. ^ Kibler, Barker, and Mi^es believe that test pre- 
paration is simplified when evaluative measures are designed to measure 
the success of instruction in terms of behaviors identical to those 

5 

specified in objectives, Lehman maintains that the nx)st important 
part of test coir tructicn ts clearly difining the objectives of the 
test.^ Tht test that was developed is criterion-referenced; Leonhard 
and Mouse state, , ,che only criteria applicable to the music 

7 

program are the objectives.*' 



2 

Robert Glaser and David J, Klaus, ‘'Proficiency Measurement; 
Assessing Human Performance," P sychol o gical Principles in System 
Development, Robert M. Gagne, editor Tngw York: HoTtV Rine)iart, and 

Winston, Inc., 1962), p. 430. 

M. Lindvall, Measurif^g Pupi l Achievement and Aptitude (New 
York: Harcourt, Brace, anTTiond, 1 nc . , 12. 

^Harold E. Mitzel, "The IMPENDING Instruction Revolution," Phi 
Delta Kd ppan , LI {April, 1970), 438. 

5 

Robert 0. Kibler, Larry L. Berker, ai i David T. Miles, Behavioral 
3]?^ Ins truction (Boston; Allyn and Bacon, Inc., 1970) p, 13. 

^Pau) R Lehinan, T ests and Me asureme nts In Music (Englewood Cliffs, 
Maw Jersey: Prenticc-;iul 1 , Inc . , 196nT7 p. 79. 

^Charles Leonhard and Robert W. Mouse, Foundations and Principles 
of Music _Education (New York: McGraw-Hill Book CompanyTlnc , , ITSOTT" 
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The formulation of objectives related to instruction appears to be 
the necessary first step in test construction; prominent writers call 
attention to the importance of objective construction, and appropriate 
objectives seem valuable as tools for the conceptualization of what is 
to be .neasured as well as statements of criteria for the development of 
a criterion referenced test. ("Instruction" here is used to represent 
the sum of musical input received by the student prior to the commence- 
ment of test administration, and is not limited to a particular amount 
of input from any formalized course situation.) 

Selection of Obje c tives 

Preparation of behavioral objectives checklist . To simultaneously 
state valid objective* fo" undergraduate students In music education 
and aelineate criteria to determine the extent of attainment of the 
objectives, a checklist of forty-two objectives written in the form, 

"Given , the student will be able to was prepared 

and distributed by the researcher to faculty members of the Department 

of Music Education and graduate students in music education at The 

Pennsylvania State University during the summer tern of 1969. The 

forty-two statements of behavioral objectives were related to the fol- 
lowing arbitrarily selected nonperformance musical behaviors: 

Aural recognition and identification of melodic intervals. 
Aural recognition and Identification of harmonic intervals. 
Aural recognition and classification of major, minor, 
augmented, and diminished triads. 

Insertion of missing notes into visual notational displays 
^ of aurally perceived melodies. 
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Insertion of missing notes into visual notational 
displays of aurally perceived harmonic sequences. 

Recognition of harmonically correct parts to complete 
four-part harmonic passages. 

Construction of harmonically correct parts to complete 
four-part harmonic passages when one part is missing. 

Recognition and location of aural -visual rhythmic 
discrepancies . 

Selection from arrays of explanations of appropriate 
explanations of incorrectly performed rhythmic patterriS. 

Recognition and location of incorrectly notated 
measures for given meter signatures. 

Selection of the members of pairs of examples that are 
performed "better" when "better" refer.- to t.^pered phrase 
endings, dynamics, appropriateness of breathing, or appropri- 
ateness of articulation style. 

Indication of the appropriateness of overall interpreta- 
tion of examples and identification of inappropriateness as 
being due to inappropriate tempo, inappropriate articulation, 
excessive rubato, lack of rubato, or inappropriate dynamics. 

Classification of examples as being representative of 
Medieval, Renaissance, Baroque, Classical, Romantic, or 
Modern Periods. 

Selection of the members of pairs of examples containing 
ornamentation (trills, grace not.s, mordente, grupetti) that 
are perfom:ed in the more appropriate style. 
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Three behavioral objectives, varying in tha size of the array of 
choices available to the student and/or the number (five, ten, or 
twenty) of twenty examples to which the student was to respond cor- 
rectly, were constructed for each behavioral area. Respondents were 
asked to Indicate whether they believed each objective to be appropriate 
for freshman music education majors, seniors, both groups of students, 
or neither group. The original research proposal had called for 
separate sets of Items for entering students and students near gradu- 
ation; theref'-re, there were separate "freshman" and "senior" cate- 
gories. 

Analysis of the checklist . Perhaps the checklist (to wnich six 
faculty members and twenty-nine graduate students responded) would have 
been more useful had the respondents been asked to rate each objective 
as "appr jpriate" or "Inappropriate" for "music education students." 
Respondents seemed to have difficulty classifying according to freshmen 
and seniors. 

Further difficulty In analyzing the checklist was experienced when 
statistical tests were considered to seek any trends In the data for 
each objective. The xi one-sample test, originally planned, was aban- 
doned because It shows only that observed frequencies do or do not 
deviate significantly from expected frequencies; what the expected 
frequencijs should be was not clear. The Kolmogorov-Smirnov one-sample 
test was applied to the graduate students' data for each objective by 
ordering the four categories of responses on a difficulty continuum 
running freshmen+both+seniors+nel ther , but the abandonment of this 
statistical test appeared advisable because, although significant 
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deviations from expected cumulative frequencies were revealed, particu- 
larly when "senior" and "neither" categories were heavily checked, 
considerable doubt was raised about the appropriateness of ordering 
essentially discrete data on a continuum. Respondents may have differed 
widely in their interpretation of the "both" and "neither" categories; 
they mav not have checked them in terms of difficulty. The application 
of a binomial test to each objective by formulation of dichotomies of 
"most frequent response-all other responses" was believed to show eny 
strong trend to one category where such a ti^c'id existed, but the small 
size of the faculty "sample" made the test inappropriate for that group. 

Faculty opinion of any proposed objective was considered to be of 
prime importance, it was decided to reject any objective that two or 
more faculty members had checked as being inappropriate for either 
group. Objectives thus rejected totalled eleven; all rejected objs ;- 
tives had asked the student to respond correctly to twenty of twenty 
items. None of the fourteen categories of nonperformance musical beiav- 
iors was completely rejected; i.e., in no case were all three objectives 
formulated for a particular area checked as appropriate for neither 
group. 

Qualitative analysis of faculty and student feedback was more 
illumiinating than the attempts at statistical analysis. One frequent 
point raised was the difficulty of judging the appropriateness of an 
objective without seeing and hearing the test items to be associated 
with the objective. Some faculty members questioned whether the tradi- 
tional tasks of Interval and triad recognition were really indicative of 
any desirable competencies for music educators. 

O 
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Choice of objectives . No behavioral area was completely rejected, 
and, to ? certain extent, final judgment of the appropriateness of an 
objective apf;eared to depend upon the resultant test Items. The behav- 
ioral objectives checkllsv. and the behavioral areas upon which the 
checklist objectives had been based were reviewed; the following non- 
quantltatlvc objectives for music educa';ion students were stated to 
provide a basis for Item constructioti: 

1. The music education major should aurally recognize 
and Identify melodic Intervals. 

2. The music education major should aurally recognize 
and Identify harmonic Intervals. 

3. The music education major should aurally recognize 
and classify major-, minor, augmented, and diminished triads. 

4. The music education major should Insert missing 
notes Into visual notatlonal displays of aurally perceived 
melodies. 

5. The music education major should recognize and 
locate aural-visual pitch discrepancies in four-part harmonic 
passages. 

6. The music education major should recognize and 
locate aural -visual rhythmic discrepancies. 

7. The music education major should select from arrays 
of explanations appropriate explanations of Incorrectly per- 
formed rhythmic patterns. 

6. The music education major should recognize and 
locate incorrectly notated measures for given meter slgna- 
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9, The music education major should select the members 
of pairs of examples that a^e performed "better‘d when 'better" 
refers to tapered phrase endings, dynamics, appropriateness 
of breathing, or appropriateness of articulation style. 

10. The music education major should identify and 
classify inapprapriateness of interpretation when the 
inappropriateness is due to inappropriate umipo, inappro- 
priate articulation, excessive rubato, lack of rubato, or 
inappropriate dynamics. 

13. The music education major should classify examples 
as being stylistically representative of the Baroque 
Classical, Romantic, or Modern Period. 

12. The music education major should classify examples 
as being stylistically representative of acid rock, soul, 
cc'Mntry-western, pop standard, "bubblegum", folk, folk 
rock, or blues. 

In its Interim Report, the MENC Comriiission on Teacher Education 
presented a broad list of musical competenc’ies, including skills in 
performance, composition, and analysis, which should result from a total 
undergraduate program in music education. The objectives stated above 
are all conceptually germane to one or more of the competencies sug- 
gested, by the Commission. Objectives one, two, three, four, and five, 
for example, may be deemed relevant to the Commission's call for com- 
petency in the identification of compositional devices and the organi- 
zation of sounds for personal expression. Relevancy is apparent between 
the Comlssion's declaration that music educators need to be effective 
in the supervision and evaluation of the musical performance of others 
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and objectives five, six, seven, eight, nine, and ten. Numerous other 
relationships may be pvidenced upon comparison of the objectives and 

o 

the Commission's report. 

The list of objectives was not intended to cover comprehensively 
the universe of nonperformance musical behaviors; it was intended to 
provide a working list of expected behaviors upon which to build test 
items. The ambiguity which resMlts from the lack of numerical crHerla 
and indication of a t'me and place at which the behavior should occur 
is intentional. The test which was constructed measures, wit^'in each 
area tested, the degree to which, in terms of the number of items on a 
scale ordered in empirically established difficulty levels, a behavior 
is mastered. Prior to receipt of an undergraduate degree in music 
education, at some point in time, a music education major. In the 
opinion of the researcher as substantiated by members oi a music educa- 
tion faculty, ought to display the behaviors listed. The crit.rion- 
referencing of the test derives from the construction of items in refer- 
ence to expected behaviors, rather than from specific course objectives 
or a series of behaviors prerequisite to a criterion behavior. For 
research purposes, it was deemed sufficient to construct test Items in 
relation to the list. 

®MENC Commission on Teacher Education, "Teacher Education In 
Music: An Interim Report of the MENC Commission on Teacher Education," 
Music Educators Journal, LXII (October, 1970), 39-41. 
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I DEVELOPMENT OF ITEMS 

Selection of Musical Materials 

Musical materials selected for item construction included melodies 
I chosen from pedagogical and orchestral literature, chorales, and 

I recordings of various styles of music. Although selection of material 

i was made with its usefulness for future test items in mind, no particu- 

lar musical example was selected for any particular test item. 

It em Construction 

I 

‘ Appropriate musical excerpts were examined in light of objectives, 

j In a broad sense, all test items ask the student either to classify or 

to detect a discrepancy between what he sees and what he hears. There 
I was a conscious effort to vary the difficulty of items within each 

section. A variety of instruments was utilized for reccj Jing; length of 
excerpt and apparent saliency of aural-visual discrepancies were varied, 
j Thirteen groups of test items were constructed and prepared for empir- 

ical trial. Scales of twenty items each were planned for computeriza- 
j, tion, but, in the initial construction stage, an excess of items was 

J developed to increase the likelihood of obtaining satisfactory twenty- 

' -i tern scales . 

g 

Melodic intervals . The melodic intervals group consists of 
seventy-eight pairs of successive tones played on piano, clarinet, 
bassoon, baritone, tuba, flute, oboe, bass clarinet, horn, alto saxo- 
phone, cornet, or trombone. The unison, minor second, major second, 

9 

Hereafter referred to as the MI group. 
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minor third, major third, perfect fourth, tritone, perfect fifth, minor 
sixth, major sixth, minor seventh, major seventh, and perfect octave 
appear six times each, with the lower tone of the pair Ovocurring once 
within each of the octaves - C, C - c, c - c ' , c‘ - c", c" - c'", and 
c'" - c""J*^ In all cases the lower tone is played first. The stu- 
dent's task is to choose the name of the interval from an array of 
twelve names, flo musical notation is viewed by the student. 

Harmonic interval s. The harmonic intervals group^^ is similar to 
the MI group. The identical intervals are utilized, played simulta- 
neously, presented in a different order, and performed with different 
instrumentation. Again, the stimulus is aural. 

Triad classification . Major, minor, augmented, and diminished 

1 2 

triads are pre^'ented in the triad classification group. The four 
types of triads appear in root position, first inversion, and second 
inversion with the lowest of three tones occurring once within each of 
the octaves C - c, c - c', c' - c", and c" - c"’. The forty-eight 
triads are played on piano or with various combinations of three wind 
instruments utilizing flute, clarinet, oboe, bassoon, alto saxophone, 
bass clarinet, cornet, horn, trombone, baritone, or tuba. The student 
taking the test views no notation; after hearing a triad he is asked to 
indicate whether the triad is major, minor, augmented, or diminished. 

^*^This notation is in accordance with that used in Robert W. 

Lundin, An Objective Psychology of Musi c (2nd ed.; New York: Ronald 

Press, 1967), p. 19. 

^^Hereafter referred to as the HI group. 

1 ? 

Hereafter referred to as the TC group. 
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1 3 

Omitted notes . The omitted notes group requires the student to 
follow the musical notation while he listens to a performance of the 
musical excerpt. One note is missing from the notationai display; the 
"omitted" note is replaced by a question mark. After he hears one of 
the seventy-three ON items, the student is asked to choose from an array 
of four notes the note which represents the pitch he heard at the 
location of the question mark. Examples of ON items are found in 
Appendix A. 

Erroneous not es. Four part chorales are used in the erroneous 
notes group^^; there are eighty items in the item pool. Each chorale is 
performed by a woodwind group, a brass group, or a pianist. One i ote is 
performed incorrectly in seventy of the EN items, and th^ studenc is 
asked to indicate which one of four circled notes on the notationai 
display is incorrectly performed. Ten items ask the student to cioosc 
from the entire display. Errors vary in assumed difficulty of detection 
from incorrect pitches that disagree with the key signature to changed 
doublings within triads. 

Rhythnt ic discrepancies . Changes from notated rhythm occur within 
a measure in the seventy-three items comprising the rhythmic discre^- 
ancles groupj^ The student indicates the number of the n.easure con- 
taining the discrepancy, if any, between his aural and visual input. 



1 3 

Hereafter referred to as the OH group, 

^^Hereafter referred to as the EH group. 
1 s 

Hereafter referred to as the RD group. 
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Rhythmic errors include interchanged note values, omitted rests, incor- 
rectly performed pjtterns, and doubled or halved note values. Piano 
and a variety of wind instruments are used to perform the items. 

0vt-;ra11 rhythmic inaccuraci es. The overall rhythmic inaccuracies 
'jroup^^ differs from the RD group; in an ORI item, the rhythm problem 
occurs over more than one measure. The tempo or a pattern may be con- 
sistently distorted. Certain items contain no inaccuracies. Conven- 
tional multiple-choice format is used for the seventy-three items; the 
student chooses his answer for each item from an array of four explana- 
tions of the rhythmic inaccuracy. Appendix A contains examples of ORI 
i terns . 

Incorrect measure for signature . A strictly visual incorrect 
measure for signature group^^ asks the student to study four-measun'e 
patterns written in one-line rhythmic notation and, for eighty items, 
select the one measure, if any, that contains an incorrect total of 
counts for the given meter signature. 

Better phrasing . Two versions, labelled "A" and "B", of each of 

seventy-three melodies are presented to the student in the better 
1 8 

phrasing group. The notation is displayed to the student; wind 
instruments are used for the performance. The student's task is to 
indicate whether the "A" or "B" version is phrased better, or to 

* 

^^Hereafter referred to as the EN group. 

^ ^Hereafter referred to as the RO group. 

1 Q 

Hereafter referred to as the ORI group. 
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indicate that there is no substantial difference. Most items do contain 
a difference; one version contains an abruptly terminated note or an 
unnatural interruption of the musical flow caused by inhaling at an 
improper time. 

Faulty interpretatio n. In a group of seventy-three faulty inter- 
1 Q 

pretation items, a melody is performed on a wind instrument or piano 
while the student follows the notation. In the manner of printed music, 
the visual display contains certain information about tempo, dynamics, 
and style in addition to notation. From an array of four explanations, 
the student is asked to choose the one that best explains what is wrong 
with the performance he is hearing. The "faui-iness" of any given 
interpretation may be due to lack of observance of dynamic levels and 
changes, incorrect articulation style or pattern, choice of a tempo not 
in agreement wi tli the tempo marking, or excessive (or insufficient) 
rubato. Examples of FI items may be viewed in Appendix A. 

Questions might be raised regarding the testing of the recognition 

of faulty interpretation because interpretation is likely to be rather 

subjective and personal. Tiie researcher shares Hoffren's view that 

there are certain broad limits to acceptable interpretation. Teachers 

are expected to guide the interpretation of their students along cul- 

20 

turally sanctioned lines. When the music clearly Indicates certain 

1 9 

Hereafter referred to as the FI group. 

^^^James Hoffren, "A Test of Musical Expression," Co uncil for 
Research in Music Education, Bulletin No. 2 (Winter, 1964), 32. 
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guidelines regarding tempo, dynamics, or articulation, there are devia- 
tions possible to an extent which could be classified, albeit subjec- 
tively, as faulty interpretation. 

Historical classification . A total of sixty-seven excerpts from 

21 

recordings are in the historical classificaLfon group. In one ver- 
sion, the student is asked to indicate which one of four given years Is 

the best estimate of the year of composition of the excerpt he '3 
22 

hearing. In the other version, the terms Baroque, Classical, Roman- 

23 

tic, and Modern are used in lieu of years; other examples are Included 
in Appendix A. 

M 

Pop ular classification . The popular classification group*^^ 
requires the student to classify the excerpt he hears as being repre- 
sentative of acid rock, soul, folk, country-western, pop standard, 
"bubble gum", or folk rock styles. 

Broad categori es. When the proposal was written, three broad cate- 
gories of items were proposed: pitch, rhythm, and interpretation. The 

category of style was added after submission of the proposal. Item con- 
struction, when ccncluded, yielded five groups in the pitch category 
{MI, HI, TC, ON, and EN), three groups in the rhythm category (RD, ORI , 
and IMS), two groups in the interpretation category (BP and H) and two 
groups in the style category (HC and PC). 

21 

Hereafter referred to as the HC group. 

op 

‘■‘^Hereafter referred to as the HC(Y) grouo. 

■’Hereafter referred to as the HC(L) group. 

^^Hereafter referred to as the PC group. 
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PREPARATION FOR EMPIRICAL TRIALS AND PROGRAMMING 

Cards . I tern construction was, at ^irst, largely conceptual. All 
materials had been selected and the content of the test stimuli deter- 
mined, but it was believed necessary to have separate, discrete records 
of the test stimuli. The test questions with their answer arrays, the 
contents of the tapes in notation, and the content of notational 
display?i were placed on 5 x 8 cards. This lengthy quasi -clerical 
process was justified because it would facilitate recording and manipu- 
lation o" item order. 

Recor ding . With the exception of the IMS group, all item groups 
required aural stimuli. Ihe HC and PC excerpto were made via a Bogen 
model B61 phonograph on a VJollensak model T-1980 tape recorder. The 
other items were recorded using an Electrovoice dynamic cardioid micro- 
phone, model 676, and a V/ollensak nKidel T-1980 tape recorder. Scotch 
175 tape was used All aural stimuli were recorded monaural ly on the 
left channel. The right channel was kept clear for the future addition 
of segments of 400 hz tone; these tones function as signals to the 
computer in the audio assembly process that is part of the construction 
of software for the IBM 1500 Instructional System utilized in this 
study 

The order of items within each group was randomized with the aid of 
25 

randoni number tables. Tape recordings were made at the convenience of 
the perfonners; i.e., all the clarinet excerpts were recorded together, 

^^Jerome C, R. Li, Stati stical Inference , I (Ann Arbor, Michigan: 
Edwards Brothers, Inc., 1964) , pp. 589-598. 

O 
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al the piano oxcerpts were recorded together, etc. It was necessary 
to arrange the tapes into the proper random order through extensive 
spl icing. 



EMPIRICAL TRIALS OF TEST ITEMS 



Necessity to Establish 
Item Difficulty Indices 

Arrangement of the items within each section into a scale ordered 
according to item difficulty was necessary to provide the bases for the 
sequential or incremental aspects of the test. If item ^ + 1 is more 
difficult than item the assumption can be made, theoretically, that 
the student who answers item n + 1 correctly will also answer item £ 
correctly. Conversely, the student who is unable to answer item cor- 
rectly may be assumed unable to answer item £ + 1 "orrectly. Since the 
test under development was planned to be Incremental, i.e., every stu- 
dent would not receive every test item, such assumptions were necessary 
for a scoring procedure. 

A conscious effort was made to vary the difficulty of items within 
each section. Range, Instrumentation, and apparent conspiculty of the 
error were manipulated. Nevertheless, the difficulties of the completed 
items were unknown. Any attempt to order items according to difficulty 
would have been made on the basis of the researcher's personal estimate 
of item difficulty figures. Therefore it was necessary to administer 
each potential test item to undergraduate music education students to 
obtain an empirical estimate of item difficulty. 
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Preparation of 
Paper-and-Pencil Forms 

A separate set of paper-and-penci 1 forms was prepared for each test 
section. Included in a set of forms were the response forms arid, when 
necessary, notation sheets containing the notated musical examples to 
which the students were to listen. Conventional ditto masters, a type- 
writer, and a ballpoint pen were utilized. The staff lines were placed 
on a blank master with a typewriter, dotation was drawn freehand, with 
the aid of an ordinary ruler. All alphameric material, other than tempo 
markings, dynamic markings, and meter signatures, was typed. With the 
exception of the EN notation sheets, the end products were considered 
legible and adequate for the empirical trials. 

Administration of Items 

A total of 920 test items was constructed. The number of items 
made it impossible to administer each item at The Pennsylvania State 
University in the course of one term of ten weeks duration. A total of 
thirteen discrete periods of time, one period per test section, would 
have been an unreasonable disruption of normal instructional activity 
in music education classes, so thirteen other Pennsylvania institutions 
offering an undergraduate curriculum in music education were contacted 
and requested to provide time and students. 

Of the thirteen institutions, six were able to offer the desired 
assistance, including Westminster College (New Wilmington), Carlow 
College (Pittsburgh), Buckrell University (Lewisburg), Susquehanna 
University (Sel insgrove) , Temple University (Philadelphia), and 
Mansfield State College (Mansfield). Items were administered at those 

5U 
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six institutions and at The Pennsylvania State University. Because of 
the difficulty of making scheduling arrangements, it was not possible to 
conclude the empirical administration of test items in the desired ten 
weeks; rather* it took approximately four months. 

The propriety of establishing item difficulty Indices at institu- 
tions other than The Pennsylvania State University, the institution for 
which the computerized test was being developed, may be questioned. If 
the item difficulties established as a result of testing at other insti- 
tutions were grossly divergent from item diffi Ities that would have 
been established at Penn State, the scaling of items according to diffi- 
culty could lead to highly undesirable results. A strong difference in 
the relative ordering of items administered to Penn State students and 
administration to students elsewhere would be particularly discon- 
certing. This problem, however, was partially alleviated by calculating 
coefficients of rank-order correlation between the two orders of diffi- 
culty obtained for any subtest administered at different institutions. 
Highly significant coefficients (p's > .85) were Interpreted as being 
Indicative of necessary amount of consistency in difficulty rankings 
between two groups. 

Administrations were conducted from the end of January to the end 
of April, 1970. In each case test forms and, when necessary, notation 
sheets were distributed. Tape recordings were played on a Wollensak 
T-1930 machine through the machine's internal speakers. The same 
machine was used at all locations. Each test form had a code number. 
Each student, identifying himself only by the code number of his test 
form, completed a data card by providing information regarding his 
O 
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institution, class standing, principal performing background, and cur- 
riculum. The purpose of the test was explained to the students; the 
point was stressed that the test Itself, rather than the students, was 
being tested. 

Melodic intervals . The MI test was administered to twelve students 
at The Pennsylvania State University on April 24, 1970 and to twenty- 
four students at Temple University on April 27, 1970. Each Interval was 
played twice in anticipation of a repeat option that would be programmed 
into the computerized version of the final test. The tone quality of 
the tape appeared adequate for the purpose. Students at each location 
tended to feel that the Ml test was rather easy; this was eventually 
supported by item difficulty data which showed a sparsity of difficult 
(p < .30)^^ items. Perhaps there would have been more difficult items 
if some intervals had been presented in descending order. 

Harmonic i ntervals . On April 24, 1970 the HI test was administered 
to twelve music education students at The Pennsylvania State University. 
Twenty-one Temple University students had the test administered to them 
on April 27, 1970. Each interval was played twice. The HI test was 
apparently considerably more difficult than the Mi test; there was a 
sparsity of easy (p ^ .70) items. 

^®A proportion of students equal to or less than .30 answered 
the item correctly. Item difficulty figures throughout this research 
were computed, in the conventional manner, by dividing the number of 
correct answers to each item by the number of students attempting each 
item. See G. P. Helms tadter. Principles of Psychological Measurement 
(New York; Apploton-Century-Crofts, 1964) , p. 163. 
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Triad classification . Twenty-three undergraduate music education 
students at The Pennsylvania State University received the TC test on 
February 20, 1970; another tv;enty-one students at Susquehanna University 
received the test on February 26, 1970, Each of the forty-eight triads 
was repeated once. The tone quality of t*’e tape was generally satis- 
factory, but the less-than-perfect ensemble of the amateur performance 
caused some distraction. Some of the more difficult triads were made 
more difficult by recording them at close spacing with combinations of 
instruments such as horn, trombone, and tuba. These combinations were 
occasionally found to be annoying to students. Perhaps the instrumenta- 
tion occasionally made some triads, although legitimate, unrealistic in 
the context of traditional homophonic music, 

Omi tted notes . Two groups, one consisting of thirteen students and 
the other of nine students, were administered the ON test in a morning 
and afternoon session at Westminster College on January 26, 1970, The 
ON test was also administered to twenty-eight students at Carlow College 
on February 16, 1970. As in the other tests in the broad area of pitch, 
the ON test was administered with each tape recorded item being played 
twice. The quality of the notation sheets and the tape recordings 
appeared quite adequate for the purpose. Most students seemed to feel 
that it was unnecessary to repeat each item, but they welcomed the 
repetition of the more difficult items. 

Erroneous notes . The EN test was not successful. It was adminis- 
tered to fifteen undergraduates in music education at Bucknell Univer- 
sity C'l February 23, 1970, Fifteen students were considered to be an 
^ (adequate sample for the purpose of establishing item difficulty 
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indices; the administration of the EN test was never repeated because 
the tape and, to an extent, the notation sheets were not adequate. All 
EN items are chorales, and they were performed by a pianist, a woodwind 
group, and a brass group. In spite of extensive recording sessions, the 
ensemble performances, particularly those prepared by the brass group, 
were inadequate. Error detection was further complicated by the sheer 
length of the test; it probably would have been better to have con- 
structed fewer EN items. It was believed that the time necessary to 
revise the EN test could be spent more profitably with other tests. 

Rhythm ic discrepancies . Thirty students at The Pennsylvania State 
University received the RO test on February 5, 1970. The tape and 
notation sheets were adequate, but there was a problem caused by 
unintentional prompting. The student's task in the RO test is to follow 
the notation and indicate the number of the measure where what he hears 
is in rhythmic disagreement with what he sees. Since there is only one 
answer, once a student detects a discrepancy he can immediately indicate 
the measure. During the administration on February 6, a few students 
tended to respond because other students did; if a pencil moved during 
measure n of the performance, other pencils automatically followed. 
Instructions should have been given to wait until the music stopped 
before answering the item. Of course this would not be a problem in the 
^'inal computerized,' individualized version, but some results of the 
empirical trial may have been contaminated. Again, it was believed to 
be better to spend on another test the time needed for retrial of the 
RD test. 
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Overall rhythmic inaccuracies . Temple University was the site of 
the administration of the ORI test on April 27, 1970. The administra- 
tion appeared to go smoothly; directions were clear, and tapes and 
notation sheets were adequate. There were no complaints from the 
twenty-five students regarding the amount of time required to answer the 
questions or the nature of the questions. 

Better phrasing . The BP test was administered twice. On 
February 27, 1970, it was administered to twenty students at The 
Pennsylvania State University, the second administration was to ten stu- 
dents at Mansfield State College on April 28, 1970. The notation sheets 
and tapes were adequate. 

Paul ty i nterpreta ti on . No unforeseen problems occurred during 
administration of the FI test on February 20, 1970 and February 26, 1970 
to twenty-two students at The Pennsylvania State University and to 
twenty-one students at Susquehanna University, respectively. The issue 
of subjectivity was not raised by the students; there appeared to be 
ample time to answer the questions. Quality of the notation sheets and 
the sound reproduction were adequate for the purpose. 

Historical classification . The HC(Y) version was administered at 
Carlow College on February 16, 1970. The thirty-one students generally 
enjoyed the test; there were no difficulties with the test materials. 

The HC(L) version was administered to twenty-six students at 
Susquehanna University on February 27, 1970 and to thirteen students at 
The Pennsylvania State University on April 3, 1970. There were no 
difficulties. 
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The HC(L) version of the HC test asks the student to choose a 
letter to indicate his classification of each musical excerpt as repre- 
sentative of the Baroque, Classical, Romantic, or Modern Period. The 
HC(Y) version asks the student to choose from an array of four years the 
one he believes is the ri.ost likely date of the excerpt's composition. 
When the results of the empirical trials were examined and twenty-item 
scales were selected from the HC(Y) and HC(L) item pools, it was found 
that substantially different items were selected. Items that were 
relatively difficult in one version were relatively simple in the other 
version. It may be possible to conclude that students have processes 
for classifying excerpts by years that are different from their 
processes for classifying identical excerpts by musical periods. The 
HC(L) version was chosen for future use as a HC test. 

Nonadmin is tered tests . Time became a crucial factor; two tests 

were never administered. The PC test was developed after consultation 

with an experienced radio and television man, but the categories of acid 

rock, soul, folk, country-weste'^n, pop standard, "bubble gum," and folk 

rock may not be ample. Rock music is often difficult to classify into a 

discrete category; many examples are "hybrids" - stylistic indicators of 

two or more styles may be present, the Music Educators Journal 's 

27 

extensive treatment of youth music suggests that perhaps the PC test 
is in need of some conceptual revision prior to any administration. 

71 

Music Educators National Conference, "Youth Music * A Special 
Report," Music Educators Journal , LVI (November, 1969), 43-74. 
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The IMS test was also unadministered. Unlike the other tests, the 
IMS test contains no aural stimuli. Consequently, as time became 
crucial, it was given a lower priority than the other rhythm area tests. 

Postadministration Analysis 

Data analysis . Details of the data analysis will be reported in 
the succeeding chapter. An item difficulty index was computed for each 
item by dividing the number of correct responses to each item by the 
number of respondents attempting the item. 

Using the difficulty indices as a guide, a twenty-item scale was 
selected from the pool of items for each test. The responses given by 
each student who participated in the testing sessions were written as a 
series of coded answer strings, one string per student. Then, a hypo- 
thetical answer string was written for each student, based upon the 
responses the student gave to items that would have been presented to 
the student in accordance with the programming strategy had the student 
taken the test through the IBM 1500 Instructional System. Items that 
would not have been presented in the computerized version were coded as 
incorrect responses if they were higher in the scale {i.e., closer to 
item 20) than the highest presented item answered correctly. Items not 
presented that were lower in the scale than the highest presented item 
answered correctly were coded as correct responses. Each student's 
string of actual correct and Incorrect responses to the selected items 
for each test was compared with the hypothetical string of responses 
that would have resulted from the student answering identically the 
items presented through a computerized version of the test. 
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The comparison of answer strings served as a basis for the compu- 
tation of descriptive statistics which showed in various ways rela- 
tionships between the empirical trial and proposed computerized versions 
of the test segments. A product-moment correlation coefficient showed 
the size and degree of relationship between the actual number of correct 
responses on the selected twenty-item scale for each student and the 
hypothetical number of correct responses that would have been attributed 
to each student based upon the programming strategy. An "accuracy" 
figure was computed by subtracting the number of mispredictions of stu- 
dent responses resulting from the programming strategy divided by the 
number of possible predictions from 1.00. A correlated ^ test was 
applied to the distribution of N difference scores, i.e., the actual 
number of correct responses subtracted from the hypothetical number of 
correct responses for each student on each twenty-item scale. The null 
hypothesis was that the mean of the actual -hypothetical differences was 
not significantly different from zero. A rank-order correlation figure 
was computed for twenty-item scales selected from tests which were 
administered at more than one campus to show the relationship of item 
difficulties at the two locations. These data will be reported in the 
Dllowing chapter. 

Selection of tests for programming . Nine tests were developed and 
administered to samples large enough to provide meaningful data, but the 
number of tests to be programmed was limited to four. The amount of 
time expended on the item development and empirical trial stages was far 
greater than originally anticipated. Furthermore, a test limited to 
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four sections would fit concisely into the 75-minute class period at The 
Pennsylvania State University, and more detailed analysis could be done 
with fewer tests. 

The original commitment, through the propo«ial funded by the U. S. 
Office of Education, was to develop a prototype computerized, criterion- 
referenced test which would purport to measure certain nonperformance 
musical behaviors in the broad areas of pitch, rhythm, and interpreta- 
tion. The area of style was added to the overall design after submis- 
sion of the proposal. It appeared logical that the tests selected for 
programming should represent each area. 

The ON test was selected to represent the pitch area. It seemed to 
be the most musically interesting of the pitch tests because the items 
were melodies rather than isolated tonal stimuli. 

The ORI test was selected to represent the rhythm area. Of the two 
rhythm tests that were administered, the ORI test appeared to have the 
greater strength: The scale of difficulties yielded ""ore nearly equal 

interval s. 

The FI test was selected to represent the Interpretation area. 
Taking the test seemed to require a broader range of thinking than the 
BP test, and the empirical trials of the FI test i,ad been quite satis- 
factory. 

The HC test, in the HC(L) version, was selected to represent the 
style area. The HC(L) version was the one that had been successfully 
administered to students at The Pennsylvania State University; the low 
rank-order correlation of difficulty rankings (p = .53) between the 

HC{L) and HC(Y) twenty-item scales indicated, in part, that the two 

O 

of the HC test were rather different. 
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For a more detailed explanation of the selection process, the 
reader is referred to Appendix F, 

PROGRAMMING THE COMPUTERIZED TEST 
IBM 1500 Instructional System 

The medium for presenting the computerized test was the IBM 1500 
Instructional System, housed in the Computer-Assisted Instruction Labo- 
ratory of The Pennsylvania State University. The self-contained system, 
operational at Penn State since January 1968, is designed for individu- 
alized instruction; its capacity for rapid access and coordination of 
stimuli and rapid processing of student responses makes the system 
useful for testing. 

Central to the 1500 System is the IBM 1131 Central Processing Unit 
which provides active storage for all system data. A vast amount of 
additional data may be brought into the central processing unit from 
disk cartridges mounted on IBM 2310 Disk Storage Drives. In addition to 
controlling the processing of data, the central processing unit controls 
the physical operation of the other components of the IBM 1500 Instruc- 
tional System, including a card read punch, a printer, and the com- 
ponents of the student instructional stations. 

The student instructional stations, also referred to as terminals 
or stations, consist of a cathode ray tube screen (CRT), a typewriter 
keyboard, an image projector, a light pen, and an audio unit. The con- 
ventional arrangement of the instructional station places the CRT 
mounted atop the typewriter directly in front of the seated student. 

The image projector is to the left of the CRT; the light oen is to its 



gp^^ight. The audio unit is above the CRT. 
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The CRT resembles a television screen. Sixteen horizontal rows and 
forty vertical columns may be coordinated to provide a total of 640 
positions in which aplhameric characters or special symbols, such as 
musical notation, may be displayed. Characters most frequently appear 
on the screen as white on a dark blue background. Test questions and 
answer areas for the test reported herein are always displayed on the 
CRT. 

Students taking the test answer questions by firmly pressing the 
light pen to a lighted area on the CRT coded to the answer of their 
choice. The light pen receives light from the screen and transmits the 
location of the student response to the system which then takes the 
action for which it has been programmed, e.g., scoring a response. 

Although the typewriter may be used for input of student responses, 
in the current test the typewriter is used only for initial student con- 
tact with the computer ("signing on") and occasionally changing the 
oisplay on the CRT. 

The image projector, containing a 7.5 by 9-inch screen on which 
photographic images may be shown, is used for all displays of musical 
notation. Image cartridges containing 16rm film may contain as many as 
1,000 discrete photographs. The system has the capacity to access 
individual image frames at the rate of 40 frames per second; therefore, 
any particular combination of notational displays could be arranged in 
a desired program sequence with no necessary consideration of image 
access . 

Headphones connected to the audio unit are used to present aural 

stimuli. Tape cartridges mounted in audio units may contain as many as 
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two hours of taped messages. The four- track tape used in the cartridges 
contains three message tracks and one digital signal address track to 
allow the location of any particular message. 

The Coursewriter II programming system is used with the IBH 1500 
Instructional System. The author of material to be presented through 
the system writes coded instructions in the Coursewriter language to 
direct the presentation of content to the student. Material to be 
printed on the CRT and its location, segments of tape to be played, 
action to be taken in the event of specific student responses, and what 
image to show must be programmed into the computer. An example of 
Coursewriter programming from the computerized test may be viewed in 
Appendix D. 

Programming Strategy 

One principal characteristic of the computerized test of certain 
nonperformance musical behaviors is its incremental i zation. Originally, 
a (^5), ("-3), (-*2), (-^1), (■♦1) strategy was proposed; that is, the stu- 
dent would start with the fifth item iri a series of twenty. A correct 
response would branch him ahead to the tenth item (an increment of 
five), but an incorrect response would branch him back to the second 
item (a reverse increment of three). After one error, the forward 
increment, following a correct response, would be two. Occurrence of a 
second error would branch the student back one item and change the 
forward increment to one; a third error would terminate the administra- 
tion of the test section. 
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During the analysis of data obtained from empirical trials of test 
items, it was apparent that the original strategy would tend to cause 
premature terminations for some students. A straight linear strategy, 
in which every student would receive every item, would result in no 
mispredictions but would be inefficient and, to a computer programmer, 
conceptually alarming. A modified linear strategy was adopted, in place 
of the original strategy, as a compromise between duplication of off- 
line results and efficiency in amourt of items presented. Under the 
modified linear strategy, a student starts with the fourth item in a 
twenty-item scale. He continues to receive items in increments of four 
as long as he emits no incorrect response. The first error causes a 
reverse branch of three and changes the forward increment to one. The 
student then continues ahead regardless of the correctness of a response 
until he makes a total of five errors or three successive errors. 

Scoring Procedure 

Originally, the number of the most difficult item answered cor- 
rectly was planned to be the tested student's earned score. Consider- 
able study of student answer strings revealed that somewhat spurious 
conclusions could result in instances where a student might fail to 
answer numerous items but nevertheless manage to answer correctly one 
item of high difficulty. Therefore, rather than using scale scores, 
each student's score for each of the four programried tests was expressed 
simply in terms of the number of item? answered correctly. The student 
who answered more items correctly than another student probably 
progressed further along the scale; he had fewer strings of consecutive 

incorrect answers. 
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Audio Preparation 

All test items had been recorded prior to empirical trials of test 
items. From conventional tape recordings, audio cartridges for the IBM 
1500 Instructional System must be prepared through a special process. 

The musical excerpts for the selected items were spliced into the 
item order for the final computerized version. Using the IBM cue tone 
generator and a Roberts model 1040 tape recorder, 400 hz tone segments 
were then placed on the right channels of the tapes. These 400 hz cue 
tones functioned as signals to the computer during the audio assembly 
process; breaks in the continuity of the 400 hz tone indicated the end 
of one tape message (i.e., musical excerpt) and the beginning of another. 

After tie original tapes contained the cue tone, the audio assembly 
process was activated. The tapes were mounted on an Ampex special miOdel 
tape recorder with remote control capacity. An IBM four-track tape 
cartridge was mounted in the audio unit it one of the instructional 
stations. A special computer program was utilized to duplicate each 
message and assign to each message a unique digital address, thereby 
permitting the accessing of any particular musical excerpt by the 
Coursewriter program. 

The master tape cartridge produced during the audio assembly 
process was duplicated with a Viking model 235 tape duplicator t) 
produce the tape cartridges used in the administration of the test. 

Film Preparation 

Film preparation included preparation of the art work, photography, 
and film processing. The only stage with which the researcher was 
directly involved was the preparation of the art work, i.e., notation 
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sh-iiits. Each musical example was copied with a black felt tip pen on to 
white paper ruled with staff lines. The quality of manuscript notation 
was judged to be quite adequate for the purpose. 

After photographing of the notation sheets, the film was processed 
through the regular channels utilized by the Penn State CAI Laboratory 
for the preparation of film cartridgos. Five cartridges were made; each 
cartridge contained one exposure of each image, identified with a dig- 
ital address to permit access in the Coursewriter program. 



Debuggi ng 

Extensive examination and trial of the Coursewriter program was 
conducted by the researcher to detect and remove faulty coding (i.e., 
"bugs") from the program. Grammatical errors such as invalid codes and 
erroneous parameters are of relatively little concern with the Course- 
writer programming system because the computer will not accept state- 
ments containing such errors. Subtle errors in programming can result 
from simple typographical errors, however; results quite different from 
those c'nticipated can be obtained because of a programmer's momentary 
lapses in accuracy. For example, during the debugging process, it was 
discovered that the score for the Historical Classification section was 
often Inaccurate; the score indicated by the computer did not reflect 
the total number of correct responses accredited to the student. Inves- 
tigation located an error in the progranming segment specifying action 
to be taken in the event of a correct response to the fourth item in the 
HC scale, the item initially presented to the student. The student was 
intended to receive four points since the assumption was made that items 
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one, tvk'o, and three could have been answered correctly if item four was 
answered correctly. But an instruction that should have said ad 4>/c4 
read ad l>/c4. A simple mistake in numerals caused inaccurate scoring. 

FINAL ADMINISTRATION 

Student Population 

The computerized test was administered to thirty- two students 
during the week of October 5-9, 1970. A parallel conventional version 
was administered during the same week tu twenty-eight students. All 
students were undergraduate music education majors at The Pennsylvania 
State Universi ty. 

It was considered desirable to look for gross differences in scores 

28 

between upper-term and lower-tem students because, if the criteria 
upon which test Uems were based are representative of competency devel- 
opment currently transpiring at Penn State, there should be such differ- 
ences, (lack of such differences could be attributed to lack of 
sensitivity in the test as well as lack of representativeness in the 
criteria.} A' 1 first, second, third, and fourth tern students (N = 36) 
were chosen to participate in the study, as well as all eighth, ninth, 
tenth, eleventh, twelfth, and over-twelfth term students (N = 36). Each 
student was randomly assigned to either the computerized version or the 



28 

The traditional terms "freshmen," "sophomores," "juniors," and 
"seniors" are rarely used at Penn State. The University acadenilc year 
is divided Into four ten-week term s ; an undergraduate student Is 
classified on the basis of his term standing. Since undergraduates Iti 
music education generally require twelve terms to complete their degree 
requirements, students classified as first, second, or third term coiU' 
be called "freshmen," students classified as fourth, fifth, or sixth 
tenr. students could be called "sophomores," etc. 
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parallel conventional version, so that there were eighteen upper- term 
and eighteen lower- term students assigned to each testing situation. 

The desired number was fifteen students per tern grouping per testing 
situation; the excess was to allow for loss of a few students. 

Administrative Procedure , 

Computerized Version 

Students assigned to the computerized version were assigned a time 
to report to the Computer-Assisted Instruction Laboratory during the 
week of October 5-9, 1970. Upon arrival for his testing session, each 
student was assured by the researcher that the test rather than the 
student was being tested. Operation of the light pen was explained, and 
each student was shown how to adjust the volume of tf^e audio unit out- 
put. The student was assured that the researcher would be available if 
needed, the door to the testing room was closed, and the test program 
was permitted to run its course. At the conclusion of the test, the 
student's four subtest score.s were automatically output by a typewriter 
connected to the computer, end the researcher asked the student for an 
opinion. 

Tne items administered to each srudent were detennined, in accord- 
aiiCe with the programming strategy, by the response history of the stu- 
dent. The student was permitted to repeat a taped excerpt for any ON 
item once if he wished; the other items were played only once. If a 
student did not respond to any item within forty-five seconds to the end 
of the taped excerpt, that was considered to be an Incorrect response. 
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A dministrative Procedure , 

Conventional Versi"^ 

Students assigned to the conventional testing condition were asked 
to report to a central location on October 6, 1970. Th^ test which 
these students received was similar to the tests utilized for the 
earlier empirical trials. Each student used a mirriL igraphed test fomi 
containing printed instructions and eighty test items identicl to the 
items comprising the four twenty- item scales prograimed for the com- 
puterized version. The necessary notation for each item appeared on 
mimeographed notation sheets. The original tapes were duplicated; these 
duplicates were then edited to provide approximately eight seconds of 
silence between examples in the ON and HC sections, and approximately 
twenty seconds of silence between items in the ORI and FI sections. 

The researcher administered the test. Students were assured that 
the test was being tested, rather than they. Tape-recorded instructions 
supplemented printed instructions; students were permitted to ask ques- 
tions. All ON items were repeated; other items were played once. 

Plan for Analys i s of Data 

A questionnaire was appended to each test. Each student was asked 
which section of the test was the most difficult and the least difficult 
for him. He was asked whether, if he had a choice, he would have pre- 
ferred to take the computerized or conventional versions. He was also 
asked to evaluate the quality of sound reproduction and notation as well 
as the amount of pressure he felt while taking the test. 
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A procedure outlined by Medley was utilized to investigate the 
equivalency of the two versions of the test. According to Medley, two 
tests are equivalent only if four stringent criteria are satisfied. The 
students must be ranked in the same ords** by the two tests, the vari- 
ances of errors of measurement must be equal, the variances of test 
scores must be equal, and the test means must be equal. These condi- 
tions are tested by means of F tests after analysis of variance summary 
tables, similar to those suggested by HOyt for estimating test reliabil- 
ity in terms of internal consistency, have been plotted. The Medley 
procedure was utilized because it might Indicate the divergence of the 
computerized test from the conventional version, or, in gross terms, 
what price one must pay in terms of differing results for the conven- 
ience of computerized testing of this nature. 

SUMMARY STATEMENT OF METHOD OF TEST DEVELOPMENT 

The initial stage of test development was to frame a series of 
objectives which could be used as criteria upon which to build a crite- 
rion-referenced test. Test items were constructed in relation to those 
criteria. After empirical trial of test items, certain items were 
scaled according to difficulty, and four tests were selected for final 
administration. The computerized test and a parallel conventional test 
were administered to undergraduate music education majors, and the 
resulting data were analyzed. 

2Q 

Donald M. Medley, "A General Procedure for Testing the Equiva- 
lence of Two Tests" (paper read at meeting of the National Council on 
Measurement Usage in Education, February 19, 19S7, New York). 

O ^®Cyri1 Hoyt, "Test Reliability Obtained by Analysis of Variance," 
FR^r chometrika . VI (June, 1941), 1953-160. 
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CHAPTER IV 



RESULTS AND FINDINGS 

The purpose of this chapter 1s to present and interpret data from 
the item trial, selection, and final administration stages of the com- 
puterized and conventional versions of the test. The general procedure 
will be to discuss the purpose of the particular data collection and 
processing, present the data, and offer an interpretation of it. 

PRELIMINARY DATA 

Preliminary data include data gathered regarding test items prior 
to the final administration of the test. Item difficulty Indices and 
data resulting from comparison between results from actually adminis- 
tering selected items to students and results frorr. hypothetical 1> 
administering items to students in accordance with a programming 
strategy are included. Such data are reported herein to aid the 
reader's understandit.g of the processes of development. 

Computation of Item 
Difficulty Indices 

After administration of a section or subtest to a group of under- 
graduate students majoring in music education, the items comprising that 
section were scored. The item difficulty index for each item was com- 
peted by dividing the number of correct responses to an item by the 
number of students attempting the item.^ This was done for each sectioa 



^When a student failed to respond, his lack of response was never 
theless considered to be an Incorrect response and an "attempt." 
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Item dlT'f1cu1ty Indices obtained ranged from 1.00 (all students 
responded correctly to the item) to 0.00 (no students responded cor- 
rectly to the item). It might have been desirable to obtain item dif- 
ficulties in approximately equal numbers at equi-incremental points 
along the range (e.g.t three items with ID = .95, four items with 
ID = .90, three items with ID = .85, . . ., four items with ID = .05), 
but based upon the empirical trials, items tended to cluster more toward 
the less difficult end of the scale. 

A twenty-item scale was selected for each subtest administered lo 
twenty-five or more students. The primary criterion for selection of an 
iterii was the difficulty index; when more then one item was available for 
selection at a given level of difficulty, selection was also based upon 
musical criteria such as the quality of the performance. 

Table 1 shows the item difficulty indices for the twenty items 
selected for each subtest. It may be noted that the greatest amo'int of 
difference between any two adjacent items is .26; the least amount of 
difference is .00. 

Actual -Hypothetical Comparisons 

After selection of items fc ' twenty-item scales, answer strings 
were written for each student to whom the subtest had been administered. 
An answer string consisted of a string of 1's, indicatiiig correct 
response, and O's, indicating Incorrect responses. For example, here is 
the answer string for one student's responses to the twenty-item ON 
scale: 

11111111111100100000 
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Item Difficulty Indices of Selected 
Items for Twerty-item Sccjles 
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“For identity of the abbreviations and descriptions of the test sections the reader is referred to 
Chapter III, pages 33-38. 
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This particular student was able to answer the first twelve items in the 
2 

scale correctly. After that, he was able to answer only the fifteenth 
item correctly. 

For mathematical convenience, the assumption was made that a stu- 
dent would respond to an Identical item in an identical manner although 
the mode of presentation was different. This was believed to be a con- 
servative assumption because it denied the researcher the opportunity to 
expect nonequi valent responses and thus account for unexpected variance. 
If the items coded in the above answer string were presented to the same 
student through the IBM 1500 Instructional System, the student, if he 
behaved in accordance with the assumption, would again answer the first 
twelve items correctly, answer the next two incorrectly, correctly 
answer the fifteenth item, and miss the remaining five items. 

Once the assumption of equivalent responses to identical items was 
made, it was possible to construct hypothetical answer strings to rep- 
resent a student's responses in accordance with a programming strategy. 
Here is a comparison between the hypothetical answer string for the 
above student, in accordance with the programming strategy eventually 
adopted, and the actual answer string that resulted from the empirical 
trials of the ON items: 

Hypothetical : lllilMl, linOOOOOOOO 

Actual : 11111111111100100000 



^In the empirical trials conducted to obtain the item difficulty 
Indices, the order of item presentation was determined with the aid of a 
random number table. Hence, the order of presentation of the twenty 
items eventually choosen to comprise the scale was not, at the time of 
the trials, 1 , Z, 3, . . . , 20. 
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The underlined numerals indicate items ;;hat would have been presented in 
the computerized version. The nonunderl ined numerals in the hypothet- 
ical string indicate items for which a correct (1) or incorrect (0) 
response was assumed. In this case, the student would have been pre- 
sented with six items and earned a score of twelve. His correct answer 
to the fourth item, his initial item, would have branched him to the 
eighth item. The correct answer to the eighth and then to the twelfth 
item would have continued the increment of four. The incorrect response 
to the sixteenth item would have caused a reverse branch to item 
thirteen and changed the forward increment to one. Items thirteen and 
fourteen would have been answered incorrectly; under the assumption, 
the three successive errors (sixteen, thirteen, fourteen) would have 
terminated the ON test for this student. 

From a series of comparisons between ansi.er strings, it was 
possible to compute various descriptive statistics. One statistic upon 
which importance was placed by the researcher was the correlation 
between the actual scores of students for each twenty-item scale and the 
hypothetical scores that would have resulted from a computerized ver- 
sion. The original programming strategy was abandoned, in part, because 
the revised strategy adopted raised these correlations. These figures 
are reported in Table 2. 

Responses to items that would not have beun administered to a stu- 
dent were assumed to be correct if they were to items of less difficulty 
than the last item administered, and assumed to be incorrect if they 
were to items of greater difficulty than the last item administered. 
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The following Is a hypothetical answer string that would have resulted, 
under the assuir.ptlon of equivalent responses, from a student receiving 
the ORI scale In accordance with the adopted programming strategy: 

1 1 111 1 liOiOOiiOOOOOO 
Here, the student was hypothetically administered ten items, those Hems 
for which the response codes are underlined. The response codes for 
nonadministered items are assumed to be 1 (correct) if they are of less 
difficulty than the last item administered and 0 (incorrect) if they are 
of greater difficulty than the last item administered, item sixteen. 

The ten codes for nonadministered items may be said to represent pre- 
dictions of responses. 

Consideration of the same hypothetical ORI answer string when it is 
matched with the actual answer string yields the following: 

Hypothetical : 1 1 Ml 1 1 i 0 _1_ 0 p_ i i 0 0 0 0 0 0 

Actual : 11110011010011001000 

Of the ten predicted responses, it is apparent that there were mispre- 
dictions for items five, six, and seventeen. The remaining seven pre- 
dictions were accurate. The quantity of mispredictions for a given 
student could vary from zero to twenty minus the number of items 
administered; in mathematical language, 

0 < M < (20 - A) , 

where M indicates the number of mispredictions for a given student and 
A indicates the number of items hypothetically administered to that 
student. By sumning the number of mispredictions across all students, 
dividing that sum by the quantity obtained from subtracting the total 

number of items hypothetically administered from the total number of 
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students multiplied by twenty (i.e., the total number of predictions), 
and subtracting the quotient from 1.00 it is possible to obtain an index 
of prediction accuracy. The formula for the index of prediction accu- 
racy for a subtest may be written as 

E M 

P = 1 .00 » 

20n-EA 

where P represents the index of prediction accuracy, H represents the 
number of mispredictions for a '■.tudent, A reoresents the number of items 
hypothetically administered to a student, and n represents the number of 
students to whom the subtest was administered. Indices of prediction 
accuracy are repotted in Table 2. 

When the students' actual scores for twenty-item scales were 
matched with their hypothetical scores, a series of differe.ice scores 
(hypothetical minus actual scores) was computed. The aim was to have 
essentially the same scores result from hypothetical and actual ver- 
sions. A null hypothesis was formulated to state that there was no 
difference between the mean of the difference scores and zero. A corre- 
lated ^ test was applied for each subtest; as Table 2 indicates, the ^ 
values were nonsignificant except for the MI and HC(Y) tests. 

The data in Table 2 were based upon the assumption of response 
equivalency. To the extent that the assumption was valid, the data were 
a valid means of evaluating the tests which were constructed. It must 
be noted, however, that the data do not attempt to describe a relation- 
ship between an actual administration and a hypothetical administration 
to different students. 
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Administration at 
Varying Institutions 

It was impossible to administer all sections of the test under 
development to music education undergraduate students at The Pennsyl- 
vania State University because of the constraints of time. Conse- 
quently, as described in Chapter III, empirical trials of test items 
were also conducted at six other 'Pennsylvania institutions of higher 
education which offer an undergraduate curriculum in music education. 
When a test was administered at more than one institution, a rank-order 
coefficient of correlation was computed to show the relationship between 
the two sets of rankings (in terms of item difficulty) assigned to the 
items chosen to comprise a twenty-item scale. A low rank-order correla- 
tion coefficient (p) would indicate considerable diversity In difficulty 
order of th.a items. Seven test sections were administered at more than 
one institution; the number of students tested and their division by 
institutions as well as the computed p for each test are contained In 
Table 3. 

Examination of Table 3 reveals that four of the seven tests admin- 
istered at more than one Institution yielded a p greater than .85. Two 
test sections vere in the range .70 - .85; the MI test was below .70. 

All are significant beyond the .001 level when one uses the modified t 
test for significance of rank-order correlation suggested by 8run1ng and 

3 

K1ntz, but p a .90 wa' considered more desirable than = .70. 

3 

James L. 8run1ng and B. L. Kintz, Computational Handbook of 
Statistics (Glenview, Illinois: Scott, Foresman and Company, 1968), 

■pp. 158-1&9. 




78 















fO 


■4- 


<71 


cn 


LO 


CM 


0 












Q. 


cr> 


cr> 


<X> 


kc: 


cn 


00 


00 


























0 
















r— 


VD 
























CM 


CM 






CM 


















A 


it 






A 


01 
















>> 


>> 






>> 


C71 
















4-> 


-fj 






4J 


01 
















•r- 


•r- 


CM 


CM 


‘r- 


^— 
















1/1 


1/1 






1/1 


r—~ 
















(- 


(. 


A 


A 


(. 


0 
















01 


01 


># 


>» 


01 


0 






irt 








CSJ 


> 


> 


-fJ 




> 








4-> 








CSJ 


•r“ 


•r“ 


•r*“ 


•r“ 




01 






c 


c 








C 


C 


1/1 


C/1 


c. 


■M 






OJ 


<o 






M 


ro 


ZD 


L. 


i. 


u> 


3 






■o 


-C 






01 






01 


01 








3 


h- 






Ol 






> 


> 




to 






♦-» 


OJ 




r: 


01 


c 


C 




•A- 


c 








lO 




o 




c 


c 


c 


c 


c 


X3 








u 




•r- 


r— 






ZD 


ZD 




r*— 






IH 


o 




1/1 


o 


JC 


JC 






sz 


01 






o 


5! 




•r- 


o 


01 


01 


0 


01 


01 


•r- 












> 




3 


3 








H- 






t- 


♦-> 




•r~ 


s» 


0 * 


cr 


0 . 


CD. 


cr 


1/1 






OJ 


<9 




O 


01 


1/1 


1/1 


E 


E 


1/1 


C 














-fj 


3 


3 


01 


01 










b 


XJ 






1/1 


</> 


to 


1— 


1— 


00 


s: 






3 


OJ 




(O 


c 


















Z 


t- 




c 




• A 


• A 


• A 


• A 


• A 


• A 








OJ 




o 




CM 


ro 


CM 


CM 


ro 


0 






■o 


♦-> 






CM 








CM 


CM 






c 


V) 




♦-> 


1/1 


















rs 






13 


01 












A 








c 




♦-> 


3 


>1 


>> 


>1 


>> 


>> 


>1 






c 


•f- c 




•r* 




■M 


♦-> 


4-> 


■1-' 


4-> 


*-> 






o 


£ o 




♦-> 


• M 




•r— 


•r— 


•r— 




•A- 






•r- 


X? 




1/1 


00 


1/1 


1/1 


1/1 


1/1 


1/1 


1/1 








< 




C 


CM 


(. 


(. 


(. 


s. 


u 


d 




ro 




13 




H- 1 




01 


01 


01 


01 


01 


01 






r— 


1/1 ♦-> 






A 


> 


> 


> 


> 


> 


> 




Q) 


OJ 


vJ •r- 






01 


•r“ 


•r— 


•r“ 


•r“ 


•r" 


•r“ 






(. 


1/1 






Ol 


C 


C 


C 


C 


C 


C 




X) 


u 


<1> 1/1 






01 


Z3 


ZD 


ZD 


ZD 


ZD 


ZD 




«TJ 


o 


1— c 






















H 


o 








r*“ 


01 


01 


0 


01 


01 


01 








t- 






O 


4-> 


4-> 


+-> 


-fj 


♦-> 


♦J 






y- 

o 


o 4; 
H- c 






o 


3 


4-> 


3 


/O 

4-> 


3 


3 








o 








to 


00 


t/> 


</> 


</> 


00 






\A 


c 






o 


















4J 


o 








c 


C 


C 


C 


C 


C 






C 


•f— 






U 


c 


C 


C 


c 


d 


C 






OJ 


♦-> 






(O 


01 


01 


0 


a* 


01 


01 






*r“ 


3 






o 


CL 


CL 


o. 


Cl. 


CL 


CL 






L> 


♦-> 






















•r- 


*r“ 
























*+- 


♦-> 


























1/1 
























OJ 


C 
























o 


K- € 
























o 


t- 
























u 


OJ 
























OJ 


o. 




T3 


























QJ 




















s. 


"O 




4-> 




















o 


or 




1/1 




















i 


+J 




Cl 






















1/1 




1— 




















c 


OJ 






o 


fO 


Ol 


VD 


ro 




0 






(O 






t- 


in 




ro 


ro 


ro 




CO 






Cl£ 






01 


























XI 


























e 


























11 


























z 


























c 


























n 


























■fj 


























LI 


























01 


























</> 




























2 : 


1— < 


0 


AH 


»-H 




Cl 












■M 


0 


u. 


DL 


2 : 


a: 


1— 


cn 


o 










1/1 


























01 
















ERIC 














































►H 





71 



DATA FROM FINAL ADMINISTRATION 

4 

The final administration occurred October 5-9, 1970. The com- 
puterised test and the parallel conventional test were compared through 
the Medley procedure, discussed earlier. Comparisons between the scores 
of lower- term and upper-term students were made; responses to a ques- 
tionnaire appended to both versions were studied. 

Medley Procedure 

The Medley procedure is Illustrated through Table 4 which summa- 
rizes the procedure for the ON test administered to the total number of 
students (thirty-two in the computerized version, twenty-eight in the 
conventional version). An analysis of variance was performed for the 
group that received the computerized version, the group that received 
the conventional version, and the combined groups. These analyses of 
variance partitioned the total variance into variance attributable 
differences among students, differences among item means, and error. 

The sums of squares (SS) for the components of variance were computed, 
as Medley suggested, in accordance with Hoyt's formulas, 

SS among students = 1 Lt^ - » 

SS among Items = ^ Ip* - ^ ' 
and 

total SS = vA^k). , 

nk 

^The term "final administration" means final with regard to the 
research reported herein. The reader should not conclude that com- 
puterized testing of nonperformance musical behaviors has had Its final 
hour. 
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Table 4 

Medley Procedure for ON Test, All Students 



Source of Variation 


OF 


SS 


MS 


A. Group receiving computerized version 








(1) Students 


31 


29,9984 


0.9677 


(2) Items 


19 


57.4984 


3.026? 


(3) Error 


589 


62.2516 


0.1057 


(4) Total 


639 


149.7484 




B. Group receiving conventional version 








(5) Students 


27 


16,8314 


0.6230 


(6) Items 


19 


27.0785 


1 .4252 


(7) Error 


513 


80.3215 


0.1566 


'v8) Total 


555' 


124.2214 




C. Combined groups 








(9) Students 


59 


47.3292 


0.8022 


(10) Items 


19 


77.9292 


4.1015 


(11) Error 


1121 


149.2208 


0.1331 


(12) Total 


TT9T 


274.4792 




D. Analysis of equivalence 








(13) Groups (9-1-5) 


1 


0.5094 


0.5094 


(14) Students (1 + 5) 


58 


46.8198 


0.8072 


(15) Items (10) 


19 


77.9292 


4.1015 


(16) Error between versions (11 - 3 - 


7) 19 


6.6477 


0.3499 


(l7) Error within versions (3 + 7) 


1102 


142.5731 


0.1294 


(18) Total (12) 


TIW 


274.4792 





Test for Criterion 1: Are students ranked in same order ? 

£ = 2.7040; p <.005, criterion not met 

Test for Criterion 2: Are variances of errors of meesurenent equal ? 

— " ^ p <.005, criterion not met 

Test for Criterion 3: Are variances of obtained scores equal ? 

— ' ^(s) ' cri terion met. 

Test for Criterion 4; Are moans equal ? 

£ = 1.5846; NS, criterion met. 
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where n represents the number of items (twenty in Table 4), k .*epre- 
sents the number of students (thirty-two, twenty-eight, and sixty for 
groups A, B, and C, respectively, in Table 4), tj^ represents any par- 
ticular student's score, and p^ represents the particular number of 

5 

correct responses to any particular item. The data necessary for use 
of the Hoyt formular were readily obtainable from the typewritten score 
summary and student records provided by the computer for the computer- 
ized version or the test papers for the conventional version. 

After partitioning of the variance into components for each 
testing group and the combined group, the analysis of equivalence was 
made. Section D of Table 4 includes the quantities. Indicated in 
parentheses after the names of the sources of variation, which were 
added or subtracted, in rccordance with the Medley procedure, to obtain 
the degrees of freedom and SS figures for Section 0. For example, the 
degrees of freedom and SS for students were found by adding the appro- 
priate quantities for (1), variation attributable to students who 
received the computerized version and (5), variation attributable to 
students who received the conventional version. 

Mean squares (MS), obtained by dividing SS by the appropriate 
degrees of freedom, provided the needed quantities for the four £ tests 
used to test the four criteria for equivalence. Criterion one, ranking 
of students in the same order by each version of the test, or homogene- 
ity of function, was tested by comparing MS^gj with For the ON 

test as it was administered to all studerts, the F_ value obtained in 

^Lyril Hoyt, "Test Reliability Obtained by Analysis of Variance," 
Ps ychometrika , VI (June, 1941), 154. 
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testing for criterion one is significant beyond the .005 level; it can 
be said that there is no difference in the ranking of students yielded 
by the two versions must be rejected, and criterion one is not satis- 
fied. Criterion two, equality of variances of errors of measurement, 
was tested by comparing with in the case of the ON test, 

illustrated by Table 4, this criterion was also not met. Criterion 
three, equality of variance of obtained scores from the two versions, 
was met; it was tested by comparing with and the obtained 

F value was not significant. Criterion four, equality of means, was 
tested by comparing with the ON test evidently met this 

cri terion. 

Summary table', sitnilai' to Table 4, will be found in Appendix B for 
applications of the Medley procedure to the four progranroed tests for 
the total number of students, the lower-term students only, at.d the 
upper-term students only. Table 5, a summary of all the applications, 
indicates that no test met all criteria; YES indicates a non-significant 
£ value, and NO indicates a significant F value. The equality of means 
criterion was most frequently met; only the HC test failed. The other 
criteria were met either rarely or never. 

No section of the computerized test may be said to be equivalent to 
its corresponding conventional section. The process of computerization 
with its incremental feature may be said to have distorted the test 
beyond the po1.it of equivalency. But what is the practical meaning of 
the lack of equivalency? 

To fulfill criterion one, both versions of the test should rat.k the 
students in the same order. Item differences should Interact no more 
^ ’th differences among one group of students than with differences among 
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Summary of Medley Procedure Applications 
Indicating Presence of Equivalence of 
Tests According to Four Criteria 
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another group of students. 8ut if such interaction does differ, as ft 
did In ten of twelve cases, how critical are the differences? The 
difference in the ranking effect of identical items in the two versions 
may have been attributable to different rank orders in terms of item 
difficulty, as will be presented below. Since the purpose of the 
criterion- eferenced test was not to rank students, criterion one may 
have less significance for a criterion-referenced test than for a norm- 
referenced test. 

Fulfillment of criterion two requires equality of the errors of 
measurement which occur in any measurement situation. The assumption 
of responses to nonadmini stored itens based upon responses to admin- 
istered items in the computerized version introduced systematic error to 
the extent that the assumed rank of the nonadmini stored items in terms 
of difficulty differed from their actual rank. The complete lack of 
attainment of criterion two is one serious flaw in the test as it was 
adinini stored. 

Equality of variances of obtained scores, criterion three, occurred 
only for the ON test for the three groupings of students, and for the 
HC test for upper-term students. Failure to meet this criterion may, 
again, be traced to inaccurate positions of items In the twenty-item 
scales. Difficult items toward the supposedly easy end of a scale could 
have caused premature terminations of a computerized test section; easy 
items toward the supposedly difficult end would not have been reached by 
terminated students but would have been presented to students who 
received the conventional version. 

Equality of means occurred for all test sections except the HC test. 
In considering the two modes of te't presentation, equality of means 
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might lead to the conclusion that, on the average, the differing test 
versions would have given identical scores and facilitated identical 
interpretations of those scores with regard to what, if any, action 
should be instigated as a result of the scores. However, the comput- 
erized score of a particular student night not be representative of his 
status regarding the musical behaviors being measured. Equality of 
means accompanied by nonequality of variances of obtained scores may 
have resulted from a balance between students who received the comput- 
erized version and were terminated prematurely with students who spuri- 
ously received credit for correct responses to nonadministerad items. 
Again, this is related to the discrepancy between presumed rankings of 
item difficulty and actual rankings in the testing situation. 

All Medley criteria call for comparisons of variances which should 
lack statistically significant differences. The researcher believes 
that the significant differences observed are related to the divergency 
between expected and actual rankings of test items in terms cf diffi- 
culty. 

Comp a rison of Item 
Oifffculty Rankings 

Empirical trials were conducted to establish item difficulty 
’ndices. Items were selected to form twenty-item scales for each test 
section which was administered to at least twenty-five students; the 
difficulty indices for selected items are reported in Table 1 above. 

The strategy was to develop tests in twelve areas related to nonper- 
formance musical behaviors. (Concern for the refinement of progrem- 
ming strategy and the constraints of time were responsible for the 
duction of the number of test sections programmed to four.) Hence, 

8G 
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many test items were administered to small groups of students during 
empirical trials to establish item difficulty indices. The instability 
of those indices may have been responsible for theouant^ diffi- 

culties with the test. 

Table 6 contains the estimated item difficulty indices for the four 
twenty-item scales; the estimates are, of course, the difficulties 
obtained from the empirical trials. The observed difficulties for the 
computerized version and the conventional version,^ computed in the 
usual manner, are also contained in the table. Item difficulty figures 
for the computerized test are partially based on assumed responses. 
Discrepancies occur in certain instances, for example, the eigiiteenth 
ORI item, the sixth FI item, and the fourteenth HC item. Some items, 
of course, such as the fourth OH item and the seventeenth ORI item have 
very similar figures. 

The rank order of item difficulties varies from scale to scale. 
Ideally, the coefficient of rank-order correlation RHO (p) should be 
1.00 between any two sets of item difficulty indices for one test 
section. Rank-order correlations are reported in Table 7; the cor- 
relation between the estimated difficulty indices and the observed 
indices from administration of the conventional version varies from .43 
to .87. 

Less than perfect rank order of item difficulties means that for 
the computerized version students received credit for nonadministered 

^Item difficulties are reported on the basis of administration to 
the totcl number of students taking each version because there was no 
distinction between students regarding class standing during the 
empirical trials. 
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Discrepancies Between Estimated and Observed Item Difficulties 
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items of a difficulty level greater than those administered items which 
were answered correctly to permit that credit for nonadniinistered items. 
For example, if a student answered the first two items presented in the 
computerized ORI test section, items four and eight, correctly, he 
earned eight points and was ready for item twelve, but item three, 
according to the difficulty estimate from the conventional version, 
was more difficult than item four, and items five, six, and seven were 
more difficult than item eight. Assuming that the item difficulty 
indices computed from administi'ation of the conventional version were 
accurate estimates of the difficulty of the items for those who 
received the computerized version, nonincrementalization (i.e., admin- 
istering all items in the computerized version to all students 1n a 
linear manner) would have made possible a greater degree of equivalence. 

Comparison of Test Performance 
of tipper"^ term and Lower-term Students 

If the skills measured by the ON, ORI, FI, and HC tests cire 
increased during the undergraduate training of the music education stu- 
dent at The Pennsylvania State University, the mean performance of the 
upper-term students should have been greater than the mean performance 
of the lower-term students. Greater upper-term mean scores could 
indicate that what was tested was pertinent to the present focus of 
the curriculum. 

Table 8 reveals that, with one exception, the mean score for 
lower-tenm students was always lower than the mean score for upper-tenn 
students; however, in only one Instance was the difference statistically 
significant according to a Jt test. Upper-term students differed only 
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Conventi onal 1 0.6?. '•2.25 



from lower-Lcmi ^rudents in their mean ability to identify 
■'.otes i^iissing f)om r passage, detect an accurate explanacion of rhyth- 
:'-;c ur.icy* choose an exp] ananan of the departure from tasteful 

p e tJ uon. .h ssify -vusical examples by periods of musio 
hi story . 

Failure :o find greater differences bet^veen the me,jn test scores of 
upper- tern, ana '.ower-teTin students rrrsy be attributed to a possible lack 
.’t cu'iMcuiar c^^cmerice directed toward ir-provement of the skills 
cieasurcd. I: m=.» be attributed to a possible lack of reie/ancy to 
co.i'sovork on part cf the test; however, it was not intended 
t-j clr.eiC'P tc'sr within the confii.es of the present course structure. 

QuGS : i onnoi ro Resul ts 

A SGs/efi-item questionnai re was appended to each test version. 
Snjceiu opinion was sought regarding relative difficulty of the test 
quality of sound and notation, speedcdness of the test, 
P'.'ssurc placed on the student, and preferred version. Students who 
' ■- tni. corputen2Cd version answered the multiple-choice questions 
u : * j tn:- !:ght pe r, >-.udents who received the conventionjl version 
rrlfo iheif as,;Uf:es AM students in eat.h group answered each 
ration with nr > r'-sponse only, fables 9 through 15 summarise the 
Oiest’.onno i < 0 '^esrorsos in terms of proportions of the students indi- 
r..uing each res:>o.ise. The questionnaire Items are presented in 
A.:pendix C. 

Th‘^re nc particular expectancy regarding the test sections 
^cnsio^'cd IfiO ■Oil or the least difficult findings arc reported 

rt*; ros 9 and if;, !n each case the trend is mure clear for tlie 
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conventional than for the computerized version. The students who 
received the computerized version did not know the number of items in 
each section, since the amount of items presented to any student varied 
with the student's performance in accordance with the incremental 
prograrmriing strategy. All students who received the conventional ver- 
sion received each test item and thereby had a greater number of items 
upon which to base a decision regarding difficulty. Neither test ver- 
sion gave knowledge of results to any student prior to administration 
of the questionnaire; no student's estimation of section difficulty was 
influenced by any knowledge of his relative success among the sections. 

The quality of sound reproduction in the computerized version of 
the test was of concern. The IBM 1506 audio unit, the tape playback 
component of the IBM 1500 Instructional System, always contained white 
noise, a constant background hi sing s und, while rriusical excerpts were 
played. Deihl noted this hissing sound as well as bubbling sounds, 
apparently caused by momentary disruption of the uniform movement of the 
tape during a stage of audio cartridge preparation, and variance in 
sound quality between tracks of the tape.^ These unmusical qualities, 
plus occasional static, raised the possibility that students might find 
certain items difficult to answer for an extraneous reason. 

It was expected, therefore, that students who received the conven- 
tional version of the test, with its tape recorded at 7.5 ips, one 
generation removed from the original recordings, would evaluate the 

^Ned C. Delhi, Development and Evaluation of Computer-Assisted 
Instruction in Insirumental Music, Project No. 7-0760, ERIC No. 

ED 035 31C (Washington: OTfice of Education, U. S. Department of 

Health, Education, and Welfare, 1569), p. 36. 
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sound quality to be at a higher level than the students who received the 
computerized test, with its tape recorded at K875 ips, two generations 
removed from the original recordings, because of the extraneous noise cn 
the 1506 tape cartridge. But, as indicated in Table 11, a greater pro- 
portion of students to whom the computerized version was administered 
chose the most favorable response. This was not expected by the 
researcher; perhaps students, while they listened for relevant cues with 
which to select on answer, were more oblivious to extraneous noise in an 
individualized situation, or perhaps the wearing of headphones had sonie 
inf 1 uence. 

Table 12 summarizes the questionnaire responses regarding the 
quality of the notation. It was expected that few students receiving 
the computerized version would find the professionally processed film 
exposures of painstakingly drawn music manuscript to be of low quality. 
The mimeographed notation sheets used by the students who received the 
conventional version of the test were certainly not illegible, but were 
not comparable to printed music. 

Perceived speededness of the test versions was of interest. The 
medium of computer-assisted instruction appears to (end Itself well to 
individualization of presentation; rates of presentation of material can 
be varied greatly to accomodate students of varying work habits and 
abilities. It is possible to program presentations for student control; 
the material appearing on the cathode ray lube need not change until the 
change is requested by the student. Unlimited allowances for time to 
respond are not considered desirable in the computerized test under 
discussion, but a full forty-five seconds is allowed between the time 
Q the playback of a musical excerpt concludes and the time the student is 
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g 

automatically considered to not know the answer. Forty-five seconds 
was believed to be sufficient for virtually any student, but the time 
allotments for the conventional version--eight, twenty, twenty, and 
eight seconds respectively for the ON, ORI, FI, and HC tests--were 
planned with nn ave. age student in mind. 

It was expected that most students who received the computerized 
version would find that their test moved at a comfortable pace while 
more than a few students who received the conventional version would 
find that their test moved either too slowly or too rapidly. The 
expected results were partially found; as Table 13 indicates, most 
students found the speed of the computerized version to be satisfactory. 
It was, however, interesting that more students did not find the con- 
ventional version to be too rapid. 

Assu"ances were given to all students in each groua that the test, 
not the student, was being tested. Nevertheless, the researcher was 
interested in obtaining some Indication of tension or pressure felt by 
the students. Unfami 1 iari ty with computers and other e'ectronic 
apparatus might have been conducive to an increase in tension; mere 
placement in a testing situation, in spite of assurances given to the 
student, might have increased tension. Table 14 summarizes the ques- 
tionnaire data regarding perceived tension; it is appatent that the 
very few instances of more than slight tension which occjrred were in 
the group who received the conventional version. No particular result 
was anticipated. 



O 

In the case of tht ON test, where students have tho option of 
repeating an excerpt once before responding, the forty-five seconds are 
Q ounted in full from the time of conclusion of the second play. 
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Too rapidly; there was insuffi- 
cient time between items. .031 ,357 
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Quite tense and agitated; I was 
constantly being pressured and 

urged to produce answers. .000 .000 
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The final Item In the questionnaire asked the student whether, if 
there had been a choice, he would have preferred one version of the test 
over the other version. All students knew prior to the testing dates 
that a computerized version of an experimental test in music was to be 
compared with a conventional version of the same test. No student, of 
course, received both versions; the students were asked to express a 
preference afeer being familiarized with only one version. Expectation 
was that the majority response for each group would be that ic made no 
difference which version the student received while slightly more than 
half of the remaining resoonses from each group would indicate a pre- 
ference for the familiar version. Within the group receiving the com- 
puterized version, the proportion expressing preference for their ver- 
sion was the majority. The proportions of responses among the group 
receiving the conventional version were in accordance with expectation. 
Table 15 suimirarizes the preference data. 

N0N-Q')ANT1TATIVE FINDINGS 

The IBM 1500 Instructional System functioned smoothly and effi- 
ciently during all stages of test development. Malfunctions within the 
program were always found to be the result of human error. In all 
instances, a student who was scheduled to be tested could report to the 
Penn State Computer-Assisted Instruction Laboratory, have the operation 
of equipment briefly explained to him, and begin the test within two 
minutes of his arrival. 
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There were no problems related to any slowness of the system during 

administration of the test. An excessive amount o^ input from other 

stations can slow the presentation of material to a student at a given 

instructional station, particularly when the input is an addition or 

0 

replacement of coded instruction. This did not occur. 

Numerous students cOiimented that their experience v/ith the com- 
puterized instructional station was novel, enjoyable, or worthy of 
replication. There was no apparent apprehension regarding the equip- 
ment. One student stated a desire that all of his tests could be 
administered In the same manner. 

The brief summary of scores pri'ted at a typewriter station by the 
computer at the conclusion of each testing session was always rapidly 
available in the following format; 

STUDENT x17 
ON score is 8 
ORI score is 4 
FI score is 7 

HC score is 3 

If the tests were refined to the point where some action could bs taken 
on the basis of the scores, the quick score summary would be very 
beneficial . 

The extensive student records available from the computer provide 
an accurate record of each student's testing session. Information con- 
tained in student records includes question identifiers, response 

9 

Had it occurred, it could have been alleviated by restricting 
system usage during testing sessions to execution of existing program- 
material rather than creation or alteration of material. 
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identifiers, student identifiers, time of response^ and time elapsed 
oetween the end of musical excerpt and entry of response. From the 
student records it was easy to obtain data for item analysis and deter- 
mine which items were actually administered to any student. A sample of 
student records is presented in Appendix E. 



CHAPTER V 



SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS 

This final chapter summarizes the conduct of the research and the 
findings, states conclusions, and presents some recommendations for 
further research. 



SUMMARY 



Objectives 

The framing of valid objectives upon which to build criterion- 
referenced test items was the initial phase of the research. Objectives 
were stated in the form of observable nonperformance musical behaviors. 
Quantitative statements were avoided; objectives were state.nents of 
skills which were deemed important for display by competent music 
education graduates. Areas included by the objectives, not intended to 
be an all-inclusive statement of desirable nonperformance musical 
behaviors, were: 

aural recognition and identification of melodic 
intervals; 

aural recognition and identification of harmonic 
intervals ; 

aural recognition and classification of triads; 

insertion of missing notes into visual notational 
displays of aurally perceived melodies; 

recognition and location of aural-visual pitch 
discrepancies in four-part harmonic passages; 
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recognition and location of aural-visual rhythmic 
discrepancies; 

selection of appropriate explanations of incorrectly 
performed rhythmic patterns; 

recognition and location of incorrectly notated 
measures for given meter signatures; 

selection of members of pairs of examples that are 
performed "better" when "better" refers to tapered phrase 
endings, dynamics, appropriateness of breathing, or 
appropriateness of articulation style; 

identification and classification of inappropriateness 
of interpretation when the inappropriateness is due to 
inappropriate tempo, inappropriate articulation, excessive 
rubato, lack of rubcto, or inappropriate dynamics; 

classification of musical examples as being 
stylistically representative of the Daroque, Classical, 

Romantic, or Hoderr Period; 

classification of musical examples as being 
stylistically representative of acid rock, soul, country- 
western, pop standard, "bubble gum," folk, folk rock, or 
blues . 

Test Items 

Multiple-choice items wer.’ constructed in accordance with the above 
objectives using orchestral excerpts, chorales, and pedagogical litera- 
ture. Items were notated, recorded, and prepared for empirical trial to 
establish item difficulty indices. 
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Nine test sections were subjected to einpiriCdl trials at The 
Pennsylvania State University and six other Pennsylvania institutions of 
higher education offering an undergraduate music education curriculum. 
Twenty-item scales arranged in order of difficulty were selected from 
each test section. Actual student performances on those scales were 
compared with hypothetical performances which would have resulted from 
equivalent responses to those scales as they would have been presented 
through a proposed programming strategy. On the basis of the empirical 
trials and descriptive statistics obtained from the actual -hypothetical 
con pari sons , and in consideration of the four basic areas of pitch, 
rhythm, interpretation, and style, four test sections were selected for 
programming. The selected test sections were the Oniitted Notes, 

Overall Rhythmic inaccuracies. Faulty Interpretations , and Historical 
Classification sections, related to the fourth, seventh, tenth, and 
eleventh of the objectives summarized above. 

PrograiToiiing 

The selected I terns were programmed in the Coursewriter II lan- 
guage for the IBM 1500 Instructional System. An incremental programming 
strategy was utilized; a student began each coinputeri zed test section 
with the fourth item of the twenty-item scale, A correct response 
branched the student to the eighth item; the student continued to move 
ahead in increments of four items until an initial erroneous response 
occurred or the twentieth item was answered correctly. An initial 
erroneous response caused a reverse branch of three items; e.g.. If a 
student was unable to answei’ item twelve correctly, he was branched to 
^ item nine* From the point reached by the reverse branch after the 
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initi?il erroneous response, the student moved ahead along the scalr in 
a linear manner. A test section was terminated for a student when he 
reached the end of the scale, made three erroneous responses in succes- 
sion, iir :,:cde a total of five erroneous responses. His score v/as the 
number or test items actually answered correctly plus the number of test 
items to be answered correctly. Nonadmi nistered Items were 

ossjijed to oe ariswereo correctly if they were lower on the scale (i e., 
were ot less difficulty) than the highest administered item on the icale 
that was answered correctly. 

Aorinistji otion and findings 

Tne computerized test was administered to eighteen lower- tern aid 
fourteen upper-ter*m undergraduates enrolled in the music education 
^urricJum at The Pennsylvania State University during the week of 
October S-9, 1970, A parallel conventional version of the test was 
:oMi iisttred to sixteen lower- term and twelve upper- term students to 
pro, a vjit^ok vn the item difficulties and a basis for a comparison of 
test equi v'J L-rce . Students who received the computerized version worked 
3C ari lorij] Station*, they heard the musical stimuli through 

Uo dL hones, read the test questions on the catfiode ray tube screen, 
,'Uued musical nototioii on the image projector, and answered questions 
indi-atinj their choices with a light pen. Students who received the 
‘.i;ri,cntion3l version were seated in a classroom; they read the questions 
*od dnsi’ered ihom on rir eographed test forms, viewed musical notation on 
; i. r:c/gp‘^phc‘() fK>tat1on sliccts, and heard the musical excerpts through the 
sp* 'e.^s jf the tjpo r^*. order. 
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The Medley procedure, a series of £ tests for equal variances, was 
utilized to test for equivalence of the two versions in accordance with 
four criteria: Equal ranking of students, equality of variances of 

errors of measurement, equality of variances of obtained scores, and 
equality of means. Although the equality of means criterion was 
generally met, the others were not; the two versions of the test may 
not be considered equivalent. 

Neither the computerized nor the conventional version of the test 
showed any significant difference between the mean scores of upper-term 
and lower-term students. It was not clear that this was a weakness of 
the test because the students' curricular experiences may not be 
directed toward improvement of the skills measured. 

The weakness of the test, preventing its immediate implementation, 
is the discrepancy between the estimated item difficulty indices, 
established as a result of the empirical trials of test items, and the 
actual item difficulty indices, computed from the conventional version 
scores. This discrepancy caused assumptions regarding correctness of 
nonadministered items in the computerized ve'^sion to be less than 
accurate . 

The computerized test was well received by the students to whom it 
was administered. The equipment functioned smoothly, and audio weak- 
nesses present in the IBM 1500 Instructional System did not appear to 
have any adverse effect upon the test. 
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CONCLUSIONS 

Four conclusions may be drawn from the present study: 

1. Present skills, techniques, and equipment are ade- 
quate for the construction of a workable computerized cri- 
terion-referenced test of certain nonperformance musical 
behaviors . 

2 . Rank order of items, in terms of item difficulty, 
is critical to the success of an incremental programming 
strategy in computerized testing wherein assumptions are 
to be made regarding responses to nonaoministered items. 

3. The computerized cri terl on- referenced test of 
certain nonperformance musical behaviors is not equivalent 
to a conventional noncomputerized version of the test. 

4. Differentiation of mean scores between lower- 
term and upper-term students is minor and generally non- 
significant; it is uncertain as to whether this is a 
function of the test or lack of significant growth in 
the skills measured. 



RECOMMENDATIONS 



Further research is recorrmended to refine the computerized tes' and 

increase its potential utility for The Pennsylvania State University and 

its paradigmatic value for other institutions. More accurate item 

difficulty indices are required; perhaps the empirical establisi ent of 

such figures could be preceded or supplemented by rational study of the 

musical behaviors involved. Additional objectives should probably be 
O 
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formulated and new types of test items constructed from ■:hem. Alternate 
programming strategies might be actually programmed and compared. 
Specifically, the following recommendations are made: 

1, Existing test items should be administered to 
large groups (N = 200) of undergraduate music education 
majors in order to obtain more accurate estimates of 
item difficulties. 

2. In some cases, the grouping of test items 
according to difficulty nii'ght be approached by analyzing 
the behaviors involved in responding to the items and 
establishing an ordered series of prerequisite behaviors. 

3, Additional objectives related to nonperformance 
musical behaviors should be formulated and test items 
constructed: however, this should not precede the 
strengthening of existent items. 

4. After the reordering of test items on the basis 
of stronger estinates of difficulty, a three-group study 
should be conducted to compare the relative merits of 

1) a computerized test programmed in a manner identical 
to the test developed in the study reported herein, 

2) a computerized test programmed following a differing 
strategy, and 3) a parallel cor.ventional version of the 
test. 

At the beginning of the first chapter it was stated that the basic 
purpose 0 ^ the study was to develop a prototypo computerized criterfon- 
-referenced test for measuring competencies in c<^rtain nonperformance 

O 
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musical behaviors present in undergraduate students commencing their 
course of study in music education. The prototype has been largely 
developed. If the recommendations can be implemented, a new and useful 
instrument wi II exist. 
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SAMPLE ITEMS 



Two sample items are included from each of the four test sections 
which were programmed. The questions, the answer arrays, the notational 
displays, and the contents of the recorded excerpt are indicated for 
each item. Content of the item was identical for each version of the 
test. The reader will recall that in the computerized version, nota- 
tional displays appeared on the image projector, questions and answer 
arrays appeared on the cathode ray tube, and the recorded music was 
heard through individual headphones. In the conventional version, the 
recorded music was played on a tape recorded for a group; the visual 
material was mimeographed. 



Omitted Notes, Item No. 5 



Question 

What is the name of the missing note? 
Answer Array 




Notational Display 








Contents of Recorded Excerpt 
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Omitted Notes, Item No. 14 

Question 

What IS the name of the missing note? 

Answer Array 
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Notational Display 






Contents of Recorded Txcerpt 






(Played on trumpet) 



Overall Rhythmic Inaccuracies, Item No. 9 

Question 

What is wrong with the rhythmic performance of this excerpt? 
Answer Array 

A. The tempo accelerates. 

B. The thirty-second notes are played as sixty-fourth notes. 

C. Unwritten ties are added. 

D. The f --groups are played as# • -groups. 

\=M brM 

Notational Display 
A1 legretto in 8 
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Contents of Recorded Excerpt 
in Strict Rhythm 








(Played on clarinet) 
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Overall Rhythmic Inaccuracies, item No. 20 



Questions 

What is wrong with the rhythmic performance of this excerpt? 
Answer Array 

A. The tempo accelerates. 

8. The tempo decelerates. 

c ■ The quarter notes are played as half notes. 

U There is nothing wrong with the rhythmic performance. 



Notational Display 
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Contents of Recorded Excerpt 
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Faulty Interpretations, Item No. 4 



Question 

What is wrong with the performer's interpretation of this melody? 
Answer Array 

A. The rubato is excessive. 

8. The rubato is Insufficient. 

C. The tempo is Inappropriate. 

D. The articulation is incorrect. 



Notational Display 
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Faulty Interpretations, Item No. 11 



Question 

What is wroncj with the performer's Interpretation of this melody? 
Answer Array 

A. The tempo is inappropriate. 

B. The articulation is incorrect. 

C. The dynamics are unobserved. 

0. The rubato is excessive. 



Notatiorial Display 



ftnaance 

pXi 1 - 


1 1 p: ! , 


. - 


t pfFli 






Tv M 


r « - 




nrr" • 






i , 




^ ^ — < 


1 






1 i ~ 



mf 



— -?=f=r 


rt-™: 








A V 


tf-L, 


A ^ L "L / V f r 


J P' 


* T -7 *11 


1 




* 7 T3ly- r f f_,rL^I 






y— ^ 


^ i- -« 


1 i 






_iV ^ 




PEJI 



Contents of Recorded Excerpt 




(Played on trumpet) 



Historical Classi f icotion, Item No, 2 

Question 

Is this excerpt most representative of the Baroque, Classical, 
Romantic, or Modern Period? 

Answei Array 
B C R H 

Notatlonal Display 
(none) 




’.22 



I 
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Contents of Recorded Excerpt 

Excerpt from first movement of Trio Sonata in F Minor , by Sammartini. 
Historical Classification, Item No. 17 

Question 

Is this excerpt most representative of the Baroque, Classical, 
Romantic, or Modern Period? 

Answer Array 
B C R M 

Notational Display 
( none) 

Contents of Recorded Excerpt 

Excerpt from second movement of Symphony No. 1 , by Mahler. 
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APPENDIX B 



SUMMARY TABLES FOR MEDLEY PROCEDURE DATA 



ErJc ’ S'! 

UfflyiiifftliiTi-iaaiJ 



Medley Procedure for Omitted Notes, 
All Students 



Source of Variation 


OF 


SS 


MS 


A. Group receiving computerized version 








(1) Students 


31 


29.9984 


0.9677 


(2) Items 


19 


57.4984 


3.0262 


(3) Error 


589 


62.2516 


0.1057 


(4) Total 


m 


149.T484 




B. Group receiving conventional version 








(51 Students 


27 


16.8214 


0.6230 


(6) Items 


19 


27.0785 


1 .4252 


(7) Error 


513 


80.3215 


0.1566 


(8) Total 




Ta.2114 




C. Combined groups 








(9) Students 


59 


47.3292 


0.8022 


no Items 


19 


77.9292 


4.1015 


(11) Error 


1121 


149.2208 


0.1331 


(12) Total 


TT5? 


mrm 




D. Analysis of equivalence 








(13) Groups (9-1-5) 


1 


0.5094 


0.5094 


(14) Students (1 + 5) 


58 


46.8198 


0.8072 


(15) Items (10) 


19 


77.9292 


4.1015 


ns) Error between versions (11-3-7) 


19 


6.6477 


0.3499 


(17) Error within versions (3 + 7) 


1102 


142.5731 


0.1294 


(18) Total (12) 


TTW 


Tnrjm 




Test for Criterion 1: Are students ranked 


in same 


order? 





F » ^(1*1 * 2.7040; p <.00b, cr terlon not met. 



Test for Criterion 2: Are variances of errors of measurement equal? 

“ " ^(jl * P <.005, criterion not met. 

Test for Criterion 3: Are variances of obtained scores equal? 

F « ■ 1 .5533; NS, criterion rot. 

Test for Criterion 4; Are means equal? 

-"^(11) ’ criterion met. 
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Table 17 

Medley Procedure for Omitted Notes, 
Lower-tenr Students Only 











Soi;rce of Variation 


DF 


ss 


MS 


A. Group receiving computerized version 


(1) Students 


17 


17.9250 


1 .0544 


(2) Items 


19 


39.0972 


2.0577 


(3) Error 


323 


32.3528 


0.1002 


(4) Total 


359 


89.3750 




B. Group receiving conventional version 


( 5 ) Students 


15 


10.7875 


0.7192 


(6) Items 


19 


14.3876 


0.7572 


(7) Error 


235 


45.7125 


0.1604 


(8) Total 


319 


70.8875 




C. Combined groups 


(9) Students 


33 


31 .4485 


0.9530 


{10) Items 


19 


47.0867 


2.4782 


(11) Error 


627 


84.4633 


0.1347 


(12) Total 


679 


162.9985' 




0. Analysis of equivalence 


(13) Groups (9-1-5) 


1 


2.7360 


2.7360 


(14) Students (1 + 5) 


32 


28.7125 


0.8973 


(15) Items (10) 


19 


47.0867 


2.478? 


(16) Error between versions (11-3-7) 


19 


6.3980 


0.3367 


(17) Error within versions (3 + 7) 


608 


78.0653 


0 . 1 284 


(18) Total (12) 


679 


162.9985 





Test for Cr i terion 1: Are students ranked in same order? 

f = = 2.6223; p <.005, criterion not met. 

Test for Criterion 2: Are variances of errors of measurement equal? 

- ' ^( 3 ) ° P "^'001, criterion not met. 

Test for Criterion 3; Are variances of obtained scores equal? 

- " |^(j) ■ criterion met. 

Test for Criterion 'I: Are means equal? 

- ” ' 3,0491; NS, criterion met. 



O 




12G 
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Table 18 

Medley Procedure for Omitted Notes, 
Upper-term Students Only 





Source of Variation 


DF 


SS 


MS 


A. 


Group receiving computerized version 










(1) Students 


13 


6.1429 


0.4725 




(2) Items 


19 


23.4425 


1 .2338 




(3) Error 


247 


24.8571 


0.1006 




(4) Total 


279 


54.4429 




B. 


Group receiving conve’itional version 










(5) Students 


11 


6.0333 


0.5485 




(6) Items 


19 


14.6666 


0.7719 




(7) Error 


209 


35.9668 


0.1721 




(8) Total 


239 


56.6667 




C. 


Combined groups 










(9) Students 


25 


12.7923 


0.5117 




{10; Items 


19 


34.2385 


1 .8020 




(11) Error 


475 


61 3615 


0.1292 




(12) Total 


IT? 


W.353I 




D. 


Analysis of equivalence 










(13) Groups (9-1-5) 


1 


0.6161 


0.6161 




(14) Students (1 + 5) 


24 


12.1762 


0.5073 




(15) Items (10) 


19 


34.2385 


1 .8020 




(161 Error between versions (11-3-7) 


19 


0.5376 


0.0283 




(17) Error within versions (3 + 7) 


456 


60.8239 


0.1334 




(18) Total (12) 


5T9 


108.3913 




Test for Criterion 1: Art students ranked 


in same 


order? 






F = = ^.7138; p <.001, criterion 


not met. 





Test for Criterion 2: Are variances of errors of measurement equa l? 

- ~ ^(j 1 " P criterion not met. 

Test for Criterion 3! Are varii»nces of o bta ined scores equal ]_ 

F = 1.1608; NS, criterion inet. 

Test for Criterion 4 : Are m ea ns equal ? 

F -s “ 1.2145; NS, criterion met. 



O 

ERIC 
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I 
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I Table 19 



Medley Procedure for Overall Rhythmic Inaccuracies, 
All Students 





Source of Variation 




DF 


ss 


MS 



A. Group receiving computerized version 



(1) Students 


31 


32.7609 


1 .0563 


!. (2) Items 


19 


62.0171 


3.2641 


(3) Error 


589 


60.6329 


0.1033 


j (4) Total 


639 


T 55 . 6 IW 




1 . B, Group receiving conventional version 








(5) Students 


27 


10.0339 


0.3716 


1 ( 6 ) Items 


19 


31 .3768 


1 .6514 


1 (7) Error 


513 


93.5732 


0.1824 


( 8 ) Total 


559 


1 34.9839 




j C. Combined groups 








(9) Students 


59 


43.2367 


0.7328 


1 (10) Items 


19 


86.8367 


4.5702 


1 01 ) Error 


1121 


160.5633 


0.1432 


* O 2 ) Total 


rm 


Mi75367 




1 0. Analysis of equivalence 








1 (13) Groups (9-1-5) 


1 


0.4419 


0.4419 


(14) Students (1 + 5) 


58 


42.7948 


0.7378 


j (15) Items (10) 


19 


86.8367 


4.5702 


1 (16) Error between versions (11-3-7) 


19 


6.1572 


0.3241 


(17) Error within versions (3 + 7) 


1102 


154.4061 


0.1401 


. (18) Total (12) 


TT99 


?90.6367 




^ Test for Criterion 1: Are students ranked 


in same 


order? 





- “ ^(i?) * 2.3133; p <.001, criterion not met. 



Test for C riterion 2: Are variances of errors of measurement equal? 

- P *=-001i triterion n3t met. 

Test for Criterion 3: Are variances of obtained scores e qual? 

“ * ^( 5 ) * 2.8439; p <.005, criterion not met. 

Test for Criterion 4; Are means equal? 

triterion met. 
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Table 20 

Medley Procedure for Overall Rhythmic Inaccuracies, 
Lower-term students Only 





Source of Variation 




OF 


ss 


MS 



A. Group receiving computerized version 



(1) Students 


17 


20.7000 


1 .2176 


(2) Items 


19 


34.3222 


1 .8064 


(3) Error 


323 


34.07/8 


0.1C55 


(4) Total 


359 


89.1O0O 




B. Group receiving conventional version 








(5) Students 


■|b 


7.3719 


0.4915 


(6) Items 


19 


16.4344 


0.8650 


(7) Error 


285 


54.8156 


0.1923 


(8) Total 


3T9 


78.6219 




C. Combined groups 








(9) Students 


33 


28.1132 


0.8519 


(10) Items 


19 


45.7338 


2.4070 


(11) Error 


627 


93.9162 


0.1498 


(12) Total 


m 


167.7632 




D. Analysis of equivalence 








(13) Groups (9-1-5) 


1 


0.0413 


0.0413 


(14) Students (1 + 5) 


32 


28.0719 


0.8772 


(15) Items (10) 


19 


45.7338 


2.4070 


(IS) Error between versions (11-3-7) 


19 


5.0228 


0.2644 


(17) Error within versions (3 + 7) 


608 


88.8934 


0.1462 


(18) Total (12) 


679 


167.7632 




Test for Criterion 1: Are students ranked 


in same 


order? 





- ' ^(I!) * P criterion not met. 



Test for Criteri on 2: Are variances of errors of measurement equal? 

— * ^(i) ~ P <.001, criterion not met. 

Test for Criterion 3: Are vs'^iancps of obtained scores equal? 

£ = ^(jj ■ 2.4773; p <.0b, criterion not met. 

T est for Cr 1 terion 4 ; Are means eg ual? 

F = = 21 .2397; NS, criterion met. 
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Medley Procedure for Overall Rhythmic Inaccuracies, 
Upper-term Students Only 





Source of Variation 


OF 


SS 


MS 


A. 


Group receiving computerized version 










(1) Students 


13 


11.5750 


0.8904 




(2) Items 


19 


29.4107 


1 .5479 




(3) Error 


247 


24.6393 


0.9975 




(4) Total 


277 


65.6250 




B. 


Group receiving conventional version 










(5) Students 


11 


2.0333 


0.1848 




(6) Items 


19 


16.7333 


0.8807 




(7) Error 


209 


36.9667 


0.1769 




(8) Total 


237 


55.7333 




C. 


Combined groups 










{9) Students 


25 


13.6173 


0.5447 




(10) Items 


19 


42.7904 


2.2521 




(11) Error 


475 


64.9596 


0.1368 




(12) Total 


517 


121 .3673 




D. 


Analysis of equivalence 










(13) Groups (9-1-5) 


1 


0.0090 


0.0090 




(14) Students (1 + 5) 


24 


13.6083 


0.5670 




(15) Items (10) 


19 


42.7904 


2.2521 




(16) Error between versions (11-3-7) 


19 


3.3536 


0.1765 




(17) Error within versions (3 + 7) 


456 


61 .6060 


0.1351 




(18) Total (12) 


5T7 


121 .3673 




Test for Criterion 1: Are students ranked 


in same 


order? 





£ = ^(J*) ■ 1-3064; NS, criterion met. 



Test for Criterion 2 : Are varianc e s of errors of measurement equal? 

- " 1 ^( 7 ) ° 5.6388, p <.001, criterion not met. 

Test for Criterion 3: Are variances of obtained scores equ a1 ? 

£ ■ mlj) “ 4.8182; p <.01, criterion not met. 

Test for Criterion 4: Are means equal? 

- " * ^3.0000; NS, criterion met. 
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Table 22 

Medley Procedure for Faulty Interpretations, 
All Students 



■ 


Source of Variation 


OF 


ss 


MS 


A, Group receiving computerized version 








(1) Students 


31 


30.6750 


0.9895 


(2) Items 


19 


61 .8750 


3. 2566 


(3) Error 


589 


61 .8250 


0.1050 


(4) Total 


FW 


154.375U 




B, Group receiving conventional version 








(5) Students 


27 


9.1214 


0.3378 


(6) Items 


19 


20.6500 


1 .0868 


(7) Error 


513 


100.4500 


0.1958 


{8) Total 


559 


130.2214 




C. Combined groups 








(9) Students 


59 


40.2367 


0.6820 


(10) Items 


19 


72.3367 


3.8072 


(il) Error 


1121 


172.4633 


0.1538 


(12) Total 


TT9T 


285.0367 




D. Analysis of equivalence 








(13) Groups (9-l-£) 


1 


0.4403 


0.4403 


(14) Students (1 + 5) 


58 


39.7964 


0.6861 


(15) Items (10) 


19 


72.3367 


3.8072 


(16) Error between versions (11-3-7) 


19 


10.1883 


0.5362 


(17) Error within versions (3 + 7) 


1102 


162.2750 


0.1473 


(18) Total (12) 


rfw 


285.0367 




Test for Criterion 1: Are students ranked 


in same 


order? 




F = = 3.6402; p <.001, criterion 


not met. 




Test for Criterion 2: Are variances of errors of 


measurement equal? 


~ ~ ^( 3 ) ~ 1 •®648; p <.001, criterion 


not met. 






Test for Criterion 3; Are variances of errors of 


obtained scores 


equaV? 


£ = ® 2.9292; p <.t01, criterion 


not met. 






Test for Criterion 4; Are means equal? 









- ” ci'^ferion met. 
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Table 23 

Medley Procedure for Faulty Interpretations, 
Lower-term Students Only 



Source of Variation 


OF 


SS 


MS 


A. Group receiving computeri^'^d version 








(1) Students 


17 


18.7274 


1 .1023 


(2) Items 


19 


34.9194 


1 .8379 


{3) Error 


323 


35.5306 


0.1100 


(4) Total 




89.1972 




B. Group receiving conventional version 








(5) Student 


15 


6.4969 


0.4331 


(6) Items 


19 


12.6844 


0.6676 


(7) Error 


285 


56.0656 


0.1967 


(8) Total 


T(9 


75.2469 




C. Combined groups 








(9) Students 


33 


26.1882 


0.7936 


(10) Items 


19 


41 .0353 


2.1598 


(11) Error 


627 


98.1647 


0.1566 


(12) Total 


679 


165.3882 




0. Analysis of equivalence 








(13) Groups (9-1-5) 


1 


0.9441 


0.9441 


(14) Students (1 + 5) 


32 


25.244' 


0.7889 


(15) Items (10) 


19 


41 .0353 


2.1598 


(16) Error between versions (11-3-7) 


19 


6.5685 


0.3457 


(17) Error within versions (3 + 7) 


608 


91 .5962 


0.1507 


(18) Total (12) 


FPI 


165.3882 




Test for Criterion 1: Are students ranked 


in same 


order? 





F = = 2.29'iOi p <.001, criterion not met. 

- 



Test for Criterion 2: Are variances of errors of measurement egua ^? 

F “ nl/^l “ 1.7882; p <.001, criterion not met. 

- MSLjJ 

Test for Criterion 3: Are variances of obtained scores equal ? 

F = = 2.5463; p <.05, criterion not met. 

Test for Criterion 4: Are neans equal? 

F = S”! = 1.1967; NS, criterion met. 

— no\ik) < 
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Table 24 

Medley Procedure for Faulty Interpretations, 
Upper-term Students Only 





Source of Variation 


OF 


ss 




MS 



A. Group receiving computerized version 





(1) Students 


13 


10.1464 


0.7805 




(2) Items 


19 


28.8964 


1 .5209 




(3) Error 


247 


24,3536 


0.9860 




(4) Total 


179 


ST.-mH- 




B, 


Group receiving conventional version 










(5) Students 


11 


2.5458 


0.2314 




(6) Items 


19 


12.4791 


0.6568 




(7) Errors 


209 


39.8709 


0.1908 




(8) Total 


W 






C. 


Combined groups 










(9) Students 


25 


12.7000 


0.5080 




(10) Item;. 


19 


34.3000 


1 .8053 




(n) Error 


475 


71 .3000 


0.1501 




(12) Total 




118. 3000 




D. 


Analysis of equivalence 










(13) Groups (9-1-5) 


1 


0.0078 


0.0078 




(14) Students (1 + 5) 


24 


12.6922 


0.5288 




(15) Items (10) 


19 


34.3000 


1 .8053 




(16) Error between versions (11-3-7) 


19 


7.0755 


0.3724 




(17) Error within versions (3 + 7) 


456 


64.2245 


0.1408 




(18) Total (12) 


5V? 


TTOiM 




Test for Criterion 1: Are students ranked 


in same 


order? 





- “ ^(i?) " 2.6449; p <.001, criterion not met. 



Test for Criterion 2: Are variances of errors of measurement equal? 

F ■ = 5.1677; p <.001, criterion not met. 

Test for Criterion 3: Are variances of obtained scores equal? 

- P '^•05, criterion not met. 

Test for Criterion 4; Are means equal? 

F - “ 67.7949; NS, criterion met. 



er|c 



133 



125 



Table 25 

Medley Procedure for Hiitorical Classification, 
All Students 



- 


Source of Var iation 




UF 


SS 


MS 



A. Group receiving computerized version 








(1) Students 


31 


32.7938 


1 .0579 


(2) Iters 


19 


52.2188 


2.7484 


(3) Error 


589 


71 .0812 


0.1207 


(4) Total 


639 


156.0938 




B. Group receiving conventional version 








(5) Students 


27 


10.0054 


0.3706 


(6) Items 


19 


26.2697 


1 .3826 


(7) Error 


513 


101 .2803 


0.1974 


(8) Total 


559 


137.5554 




C. Combined groups 








(9l Students 


59 


49.0092 


0.8307 


(10) Items 


19 


75.9425 


3.9970 


(11) Error 


1121 


174.9075 


0.1560 


(12) Total 


rm 


299.6592 




D. Analysis of equivalence 








(13) Groups (9-1-5) 


1 


6.2100 


6.2100 


(l4) Students (1 + 5) 


58 


42.7992 


0.7379 


(15) Items (10) 


19 


75.9425 


3.9970 


(16) Error bebveen versions (11-3-7) 


19 


2.5460 


0.1340 


(17) Error within versions (3 + 7) 


1102 


172.3615 


0.1564 


(18) Total (12) 


TIW 


299.8592 




Test for Criterion 1: Are students ranked 


in same 


order? 





F = + 1 .16721 NS, criterion met. 



Test for Criterion 2: Are variances of errors of measurement equal? 

- ' ^(si " P '^•001, criterion not met. 

Test for Criterion 3: Are variances of obtained scores equal? 

- * ^(s) * 2.8546; p <.005, criterion not met. 

Test for Criterion 4; Are means equal? 

- " ^(i*l ' 8.4158; p <.01, criterion not met. 




13 ^ 



126 



Table 26 

Medley Procedure for Historical Classification, 
Lower-term Students Only 









Source of Variation 


DF 


SS 


MS 


A. 


Group receiving computerlied version 
(1) Students 


17 


13.5250 


0.7956 




(2) Items 


19 


35.7194 


1 .9326 




(3) Error 


323 


32.5306 


0.1007 




(4) Total 


W 


Qrrm 




U. 


Group receiving conventional version 












1 Students 


15 


3.6875 


0.2458 




w. 


) Item 


19 


17.9375 


0.9441 




(?: 


) Error 


285 


58.0625 


0.2037 




(8: 


I Total 


3T9' 






C. 


Combined groups 
(9) Students 


33 


22.2779 


0.6751 


(10) Items 


19 


44.9103 


2.3637 


(n: 


1 Error 


627 


100.3397 


0.1600 


(12j 


1 Total 


m 


T57.3^79 




D. 


Analysis of equivalence 








1 


13 


1 Groups (9-1-5) 


1 


5.0654 


5.0654 


1 


[14] 


1 Students (1 + 5) 


32 


17.2125 


0.5379 


i 


[15] 


) Items (10) 


19 


44.9103 


2.3637 


1 


16] 


1 Error between versions (11-3-7) 


19 


9.7466 


0.5130 


1 


[17] 


1 Error within versions (3 + 7) 


608 


90.5931 


0.1490 


1 


[18] 


1 Total (12) 


m 


167.5279 




Test for Criterion 1: Are students ranked 


in same order? 




F = 


wji?! " ^*4430; p <.001, criterion 


not met. 






Test for Criterion 2: Are variances of errors of measurement 


equal ? 



~ " will * 2.0228i p <.001, criterion not met. 



Tes t for Criterion 3; Are variances of obtained score s equal? 

“ ■ ^lll “ 3.2368; p <.025, criterion rot met. 

Test for Criterion 4: Are means equal? 

- ' * 9*4170; p <.005, criterion not met. 
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Table 27 

Medley Procedure for Historical Classification, 
Upper-term Students Only 







Source of Variation 


DF 


ss 


MS 



A. Group receiving computerized version 



{1} Students 


13 


15.9464 


1 .2266 


(2) Items 


19 


19.6393 


1 .0336 


{3) Error 


247 


34.4107 


0.1393 


(4) Total 


279 


69.9964 




B. Group receiving conventional version 








(5) Students 


11 


5.4125 


0.4920 


(6) Items 


19 


14.8792 


0.7831 


(7) Error 


209 


36.6708 


0.1755 


(8) Total 


238 


56.9F25 




C. Combined groups 








(9) Students 


25 


22.8923 


0.9157 


(10) Items 


19 


28.5692 


1.5036 


(ll) Error 


475 


77.0308 


0.1622 


(12) Total 


FIT 


128.4923 




D. Analysis of equivalence 








(13) Groups (9-1-5) 


1 


1 .5334 


1 .5334 


(l4) Students (1 + 5) 


24 


21 .3589 


0.8900 


(15) Items (10) 


19 


28.5692 


1 .5036 


(16) Error between versions (11-3-7) 


19 


5.9493 


0.3131 


(17) Error within versions (3 + /) 


456 


71 .0815 


0.1559 


(18) Total (12) 


FIT 






last for Criterion 1: Are students ranked 


in same 


order? 




F = = 2.0083; p <.01, criterion 


not met. 






Test for Criterion 2: Are variances of errors of 


measurement equal? 



F = = 1.259Si p <.05, criterion rot met. 



Test for Criterion 3: Are variances of obtained scores equal? 

- ' criterion met. 

Test for Criterion 'll Are means equal? 

F = = 1.7229; NS, criterion met. 
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QUESTIONNAIRE ITEMS APPENDED TO BOTH TEST VERSIONS 



1. Of tho four sections, I thought that the most difficul t section for 
me was the 

_(A) Omitted Notes section 

(B) Ovorall Rhythmic Inaccuracies section 

(C) Faulty Interpretation section 

(D) Historical Classification section 

2. Of the four sections, I thought that the leas t difficul t section for 
me was the 

(A) Omitted Notes section 

(B) Overall Rhythmic Iiaccuracies section 

^(C) Faulty Interpretation section 

(D) Historical Classification section 

3. The overall quality of sound reproduction was generally 

^(A) very poor and distracting; it made the questions difficult 

to answer. 

(B) not good, but it did not Interfere with my ability to answer 

the questions. 

(C) fair; it certainly was adequate for the test, 

(D) quite good; It was often enjoyable to listen. 

4. The overall quality of the notation was generally 

(A) very poor; the illeglbilltv of the notes often made it 

difficult to answer questions. 

(B) not good: but it did not interfere with my ability to answer 

the questions. 

(C) not comparable to printed music, but it was c rtainly 

a^iCquate for the purpose. 

(D) quite good; it was comparable to printed music in most 

respects. 



O 
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5. With regard to the speed of the test, I think that the test 
generally moved 



.(A) too slowly; there was needless delay between items. 

(B) at a comfortable pace for me. 

(C) too rapidly; there was insufficient time between items. 



6. While I was taking the test, I generally felt 



(A) quite calm and relaxed; there was very little pressu*e 
on me. 

(8) slightly tense; there was some pressure on me, but it was 
largely of my own making. 

(C) rather tense; pressure was being placed upon me by the 
testing situation. 

(D) quite tense and agitated; I was constantly being pressured 
and urged to produce answers. 



7. If I had a choice, I would have preferred to take the test in 

(A) the CAI Laboratory, using the computerized instructional 

station in an individualized manner. 

(B) a conventional paper-and-pencil testing situation, as part 

of a group taking the test simultaneously. 

(C) It really made no difference. 
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EXAMPLE CF COURSEWRITER PROGRAMMING 



The example below illustrates use of the Coursewri ter II computer 
language. Literacy in Coursewri ter is required to interpret the state- 
ments, but, esentially, the computer is told what alphameric characters 
to display and where on the cathode ray tube screen to display them, 
what student responses to expect and what action is to be taken for 
each response, what scores to store and where to store them, when to 
play a tape segment or display an image, when to query a student, and 
how long to allow for his response. The example includes the program- 
ming for the ninth, tenth, eleventh, and twelfth FI items. 



FI9*E 

1 

2 

3 

4 

5 

6 

7 

8 
g 

10 

11 

12 

13 

14 

15 

16 

17 

18 

1 9 

20 
21 
22 

23 

24 

25 

26 
27 




PR *E 
OE 0+/32*E 
FPI 99*E 

07 0,;+/4,C+/40,0+/(W)HAT 15 WRONG WITH THE PERF0RMF.R'S*C*1 
INTERPRETATION OF THIS MEL0DY{/*E 
DT 7,5+/2.7+/35,5+/+. (T)HE ARTICULATION IS INCORRECT. *E 
OT 13.5+/2,13+/35,5+/+, (T)HE DYNAMICS ARE UNOBSERVED. *E 
DT 19,5+/2,19+/35,5+/+, (T)HE TEMPO IS INAPPROPRIATE .*E 
DT 25,5+/2,25+/35,5+/+, (T)H£ RUBATO IS INSUFFICIENT .*E 
PA 7C*E 

AUP FIO*E1040,0+/48*E 
EPP 450+/QdFI9*E 
NX *E 
BR PR1*E 

CAP 4,12,3,4+/cc*E 
SB C6+/C6*E 
AD 1+/C3*£ 

BR PR2*E 
MAP 4,6,3,4^/Wl*E 
WBP 4,18,3,4+/M3*E 
W8P 4,24,3,4+/W4*E 
BR PR1*E 
UN UU*E 

DT 28,7+/2,28+/33,7+/(T)0UCH ONLY A +,.*E 

PA 40* E 

DE 28+/ 2* E 

PR *E 

AO 1+/C5*E 

AD U/C6*E 



m 



29 

30 

l*E 

1 

2 

3 

4 

5 

o 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

*E 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 
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BR HCTR1+/C6+/E+/3*E 
BR HCTR1+/C5+/E+/5*E 

PR *E 
DE 0+/32*E 
FPl lOO'E 

DT 0,5+/4.0+/40,0+/(W)HAT IS WRONG WITH THE PERFORMER 'S*C*I 
INTERPRETATION OF THIS MELODY(/*E 
DT 7,5+/2.7+/35,5+/+, (T)HE TEMPO IS INAPPROPRIATE .*E 
OT 13,5+/2,13+/35,5+/+. (T)HE RUBATO IS INSUFFICIENT. *E 
DT 19,5+/2,19+/35,5+/+. (T)HE RUBATO IS EXCESSIVE. *E 
DT 25,5+72. 25+735. 5+/ + . (T)HE DYNAMICS ARE UNOBSERVED .*E 
PA 70*E 

AUP FI10*E1089,0+764*E 
EPP 450+7QUF110*E 
NX *E 
BR PR1*E 

CAr 4,12,3,4+7CC*E 
SB C6+7C6+E 
AD 1+7C3+E 
BR PR2*E 
WAP 4,6,3,4+7Wl*E 
WBP 4,18,3,4+7W3*E 
WBP 4,24,3.4+7W4*E 
BR PR1*E 
UN UU*E 

DT 28, 7+72, 28+733, 7+7(T)OUCH ONLY A +,.*E 

PA 40*E 

DE 28+72‘E 

PR *E 

AD 1+7C5*L 

AD 1+7C6 F 

BR HCTR1+7C6+7E+73*E 

BR HCTRl+7c5+7E+75*E 

PR *E 
DE 0+732‘E 
FPl 111*E 

DT 0, 5+74,0+740, 0+.(W)HAT IS WRONG WITH THE PERFORMER‘S+C*I 
INTERPRETATION OF THIS MEL0DY(7*E 
DT 7,5+72,7+735,3+7+, (T)HE TEMPO IS INAPPROPRIATE .*E 
DT 13,5+72,13+735,5+7+, (T)HE ARTICULATION IS INCORRECT. *E 
DT 19,5+72,19+735,5+7+, (T HE DYNAMICS ARE UNOBSERVED. *E 
DT 25,5+72,25+735,5+7 + , (T)HE Rr.jATO IS EXCESSIVE. *E 
PA 70*E 

AUP FI11*E1098,1+794*E 
EPP 450+7QUFI11+E 
NX *E 
BR PR1*E 

CAP 4,12,3,4+7CC*E 
SB C6+7C6+E 
AD 1+7C3+E 
BE PR2*E 
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I 

I 

I 

! 

! 

i: 

i; 

1 

1 

i 

I 

i 

1 

1 

1 



18 


WAP 


4,6,3,4+/Wl*E 


■19 


WBP 


4,18,3,4+/W3*E 


20 


WBP 


4,24,3,4VW4*E 


21 


BR 


PR1*E 


22 


UN 


UU*E 


23 


OT 


28,7+/2,28+/33,7+/(T)0UCH ONLY A +,.*£ 


24 


PA 


40*E 


25 


DE 


28+/2*E 


26 


PR 


*E 


27 


AD 


U/C5*E 


28 


AD 


1+/C6*E 


29 


BR 


HCTRU/C6+/E+/3*E 


30 


BR 


HCTRU/C5+/E+/b*E 


FI12*E 






1 


PR 


*E 


2 


BR 


PR2+/S3+/1*E 


3 


LD 


U/S3*E 


4 


DE 


0+/32*E 


5 


FPl 


112*E 


6 


DT 


0,5V4,0+/40,0V(W)HAT IS WRONG WITH THE PERFORMER 'S*C*I 
INTERPRETATION OF THIS MEL0DY(/*E 


7 


DT 


7,5+/2,7+/35,5V+. (T)HE TEMPO IS INAPPROPRIATE .*E 


8 


DT 


13,5+/2,13+/35,5+/+. ( 


;t)he rubato is excessive, *e 


9 


DT 


19,5V2,19+/35,5+/+. ( 


;t)he dynamics are unobserved, *e 


10 


DT 


25, 5+/2, 25+735, 5+/+, ( 


;T)HE ARTICULAflON IS INCORRECT, *E 


11 


PA 


70*E 


12 


AUP 


FI12*E1110,2+/86*E 


13 


EPP 


450+/QUF112*E 


14 


NX 


*E 


15 


BR 


PR1*E 


16 


CAP 


4,6,3,4+/CC*E 


17 


SB 


C6+/C6+E 


18 


AD 


1+/C3+E 


19 


BR 


PR2+/c5+/G+/0»E 


20 


AD 


3+/C3+E 


21 


BR 


F116*E 


22 


WAP 


4,12,3,4+/W2*r. 


23 


WBP 


4,18,3,4+/W3*i; 


24 


WBP 


4,24,3,4+/W4*i; 


25 


BR 


PR1*E 


26 


UN 


UU*E 


27 


DT 


28,7+/2,28+/3:i,7+/(T)0UCH ONLY A +,.*E 


28 


PA 


40*E 


29 


DE 


28+/ 2* E 


30 


PR 


*E 


31 


Al 


b/C5*E 


32 


AD 


1+/C6+E 


33 


BR 


HCTR1+/C6+/E+/3*E 


34 


BR 


HCTRH/C5+/E+/5+E 


35 


BR 


FI9+/C5+/£+/l*E 
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EXAMPLE OF STUDENT RECORDS 

Detailed inforTiation regarding any student's performance on the 
IBM 1500 Instructional System is available through student records. 

This example lists the performance records for six students on the 
fourth item of the ORI scale. Information contained includes the code 
number of the question (QU0R14 in this exaciple), the code numbers of the 
students, time elapsed between the end of the playing of the taped 
musical example and the students' responses, the response code and 
location of the students’ response, and the dates and times of the 
responses. 



COURSE 


SEG 


s 


EP IDENT. LATENCY 


MATCH 


DATE 


TIME 


MUTCS 


0 


X27 


QU0RI4 


39.5 




CC 


10/8/70 


14:53.93 








RESPONSE 


- ROW 


24 


COL C5 






MUTES 


0 


X28 


QU0RI4 


12.1 




W1 


10/6/70 


13:38.7 








RESPONSE 


- ROW 


06 


COL C5 






MUTES 


C 


X29 


QU0RI4 


11.3 




CC 


10/5/70 


14:51.1 








RESPONSE 


- ROW 


24 


COL C5 






HUIES 


0 


X30 


QU0RI4 


4.0 




CC 


10/9/70 


10:31 .30 








RESPONSE 


- ROW 


25 


COL C5 






MUTES 


0 


X31 


QU0RI4 


13.3 




CC 


10/5/70 


14:42.39 








RESPONSE 


- ROW 


24 


COL C5 






MUTES 


0 


X33 


QU0RI4 


12.5 




W1 


10/8/70 


9:57.80 








RESPONSE 


- ROW 


06 


COL C5 
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SELECTION OF TESTS FOR PROGRAMMING 

There were five tests from which to select in the pitch area. The 
EN test was administered to a group of students too small for the estab- 
lishment of meaningful descriptive statistics. (See Chapter III, 
pages 44-45.) The item difficulty indices tend to be weighted toward 
the less difficult end of the MI scale (Table 1, page 63); the TC test 
may have contained, at the time of empirical trials, unrealistically 
difficult items (Chapter II, page 44). The HI test after trial was 
found to contain a sparsity of easy (p > .70) items fChapter III, 
page 43). The ON test does not contain the problems associated with 
the other pitch tests; furthermore, the melodies of the ON test are of 
greater musical interest than isolated triads and intervals. Therefore, 
the ON test was selected for programming. 

Three tests were developed in the rhythm area; *:he strictly visual 
IMS test was never administered (Chapter III, page 48). The ORI tjst 
was selected for programming in pretu.ence to the RO test because the 
ORI item difficulty indices are spaced at more nearly equal Intervals 
than the RD item difficulty indices (Table 1, page 63), and the raw 
data obtained f'om the trial of the RD test might have been confounJed 
by student response patterns (Chapter III, page 45). 

In the interpretation area, there were two tests from which to 
select. The BP and FI tests were each successfully administered; the 
descriptive statistics obtained were simil.,.* (Table 2, page 66). How- 
ever, each test was administered at two institutions, and the rank-ordtr 
correlation between difficulty scales obtained at the respective pairs 
^ of institutions favored the FI test (Table 3, page 70). Furthermore, 
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the FI test, albeit subjective (Chapter III, pages 37-38), requires, in 
the opinion of the researcher, a broader range of thinking than the 
identification of unmusical interruptions in the E? test, and the FI 
test was selected for programming. 

The HC(L) version of the HC test was selected for progt aiiani ng in 
the style area primarily because it was the one test that had been 
administered at The Pennsylvania State University, The HC(Y) version 
evidently is dissimilar to the HC(L) version because the rank-order 
correlation of the difficulty rankings for each test is not close to 
1,00 (Chapter III, page 50). It was not possible to administer both 
HC versions to the same students, hence the decision was made to program 
the HC(L) version. 
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computerized testing; criterion-referenced testing; incremental testing; 
j music education; music testing; sequential testing; nonperformance musical 

i Lohaviors 



IGM 1500 Instructional System ■ 

7 ».i i' .*,7 c ■* ' - - . . - 

A prototype computorized cri to'ion-roferencod test of certain nonporfoiTiaoce 
Tiiisical i»chaviors v;as developed at The Pennsylvania State University, with the^ ; 

jcxpertation that the test could provide a pattern for developnent In sfriDar sitiia- j 
itions, ! 

A total of 783 crltericn-rcfcrcnccd tost items were administered to under- • 

^jraduates. Item difficulty indices were cemputed, and twenty-i toin scf3lcs, arranged 
in order of difficulty, were selected tor each of tr.olve subtests* Four sutlcsts 
were prcgra^m'cd for the 1811 1 500 Instructional 5 stem. , 

A sequential or incrementol pro'jraiTiing st* coqy was aciepted. A student j 

receives every fourth item of eacii twenty-item ale until lie nates an initial orrorJ 
A reverse branch of three then occurs; the forward incroment is changed to one. Eachj 
5 u[)tost is terninated when three successive errors occur, a total of five errors j 
occurs, or the end of the scale is reached, flonadiiiinistcrecl items are as?ur<ed to be i 
correct if they are of less difficulty than the most difficult correctly answered * 
adMini stored item. j 

j )<hile not statistically equivalent to an off-line version of the test, the : 
KOmputcri ?ed test performs adequately from a qualitative standpoint. 6cMnoment by ; 
rcorderin) of the Items on the basis of more stable indices of difficulty is rccom- ‘ 
mended for quantitative improvement. ; 




